1Gbps or 10Gbps network cards?

(Christophe Chevalier) #1

I’m looking at building a small “home” test cluster, for cheap, but where I can still extract the most performance.

I’m looking at the ASRock Deskmini A300 (once they are available around here) as a good candidate: cheap, can fit a 4c/8t Ryzen 2400G, up to 32GB of RAM, and even has 2 x M.2 slots.

With a basic spec of 16 GB of RAM and a single Samsung 970 PRO, it can cost less than 500€ per node (probably cheaper in $). Only issue is that it only has a 1Gbps network adapter so about 100-120 MB/s of exploitable throughput while the ssd can easily handle 1-2 GB/sec.

On the other hand, I can also easily build a mini-ITX system for about 700-800€ which would include something like an Intel x550-T2 giving me 2 x 10 Gbps ports for something around 2 GB/sec of network throughput (though I’m a bit unsure if the cpu could handle this), getting it in line with the SSD bandwidth.

Does anyone have any thoughts on what is best:

  • More cheaper nodes with 1Gbps that don’t require a fast ssd (even SATA would be enough)
  • Fewer but higher priced nodes with 10Gbps (or more) that could fully utilize a single NVME drive ?

Requiring an extra PCIe card for the x550-T2 makes the build for the cluster more complex and take more volume, and monopolize the single PCIe slot of mini-ITX boards. This makes it difficult to use Ryzen cpus like the 2600X or 2700X that don’t have an APU (headless linux works, but is a pain if I need to troubleshoot something).

But at the same time they are more efficient in term of disk utilization…

(Markus Pilman) #2

This depends a lot on your workload - so it will be very hard for anyone to give you a straight answer.

But in general I would go for the cheaper one for several reasons:

  1. FDB is better in scaling out than scaling up. So it will generally have a harder time saturating larger nodes than saturating many nodes. With the cheaper nodes you could potentially buy more of them.
  2. In out experience, network saturation is almost never an issue. FDB is an OLTP system and such systems tend to optimize for latency - not for throughput. So network latency is usually much more important than network throughput.

But keep in mind that these two statements are very general and certainly not true for all use-cases. When you size machines in your cluster, keep the following in mind:

  • Each process should have at least 8GB of memory.You probably will also need something for the OS. For testing you might want to bend the rules a bit (otherwise your 16GiB memory-machines will only be able to utilize one core). This makes sizing always a bit difficult as it means that you need >8GiB of memory per CPU core you want to utilize.
  • Most FDB workloads run many single-point lookup operations. So the amount of data you are going to read from disk will be much larger than the amount of data you send over the network. Also most reads will be non-continuous. How much this matters on modern SSDs I don’t know.

(Christophe Chevalier) #3

Thanks for the insight.

Each process should have at least 8GB of memory

That seems a bit much? My workloads have always been small compared to others out there, but I’ve rarely seen a storage process get over 4GB. Do you think that the 8 GB is required to handle huge datasets? Or are they required for performance reason? I don’t plan on having more than a few hundred GB total on this cluster.

Given the price of RAM (100€ / 16 GB RAM for me currently), that would mean 32 GB per node to get only 4 processes on a single fast disk, making the total cost of memory more than a 1/3rd of the node’s price. I was planning something like 16GB RAM / 4 process / 1 disk on a 4c/8t processor.

In out experience, network saturation is almost never an issue

What about when the cluster is re-balancing data? or when performing a backup? Being limited to 1gbps mean that these operations will take 10x longer. I’ve seen cluster being busy re-balancing for a long time and was looking at maybe helping that with 10gbps.

(Markus Pilman) #4

I can’t tell you with confidence whether 4GB will be enough or not. I would recommend making some benchmarks. You should also include the following into your foundationdb.conf file (otherwise you might start to swap - and you really don’t want that):

memory = 4GiB

Keep in mind that the storage queue will use up to 1 GB of memory. If you make this smaller you will limit your maximum write throughput.

If you use 4 processes with 4GB each on a machine with 16GB of main memory you will leave exactly 0 bytes for the operating system. I don’t think this is a good idea.

I would suggest you test this on rented machines. You could use AWS or Azure to allocate a cluster with machines of various sizes and do benchmarks whether this will work for you.

Sadly, data-movement has always been pretty slow for us - so this never created a lot of network load. But we are running on EBS so your experience might be very different.

(Christophe Chevalier) #5

I did not specify, but I was gonna use the ssd-2 storage engine, so maybe that’s why I’m not seeing usage greater than 4 GiB?

Even after a heavy workload, I usually only see about 3.6 ~ 3.7 GiB memory usage as reported by storage processes and usually less than 1 GiB reported by log processes. Which lead me to believe that I could fit 3 storage processes, 1 log process and maybe 2 stateless process on a 16 GB RAM host by pushing it a little.

I’ve never had the opportunity to do a lot of benchmark with the memory engine, which maybe would require 8 GiB per process as you observed?

(gaurav) #6

I usually see upto 4.5gb of memory usage by fdb processes when I put tlog and ss on same proc (ssd-2, double replication). I have seen this go up-to 5.3gb (ver 6.15).
Not sure if this is due to tlog and ss roles being on same proc (both require memory for queues)

(Christophe Chevalier) #7

So this really confirm that the memory consumption is very volatile, depending on the workload and architecture and can be between 3.9 to 5.3 to 8 GB… so back to square one :slight_smile:

I can guess I can only arbritarily decide on a memory capacity for each node, and “deal with it” if I underestimated what is really needed.

For boards with 4 DIMM slots, I can just populate 2 now and maybe add 2 more later if required.

For smaller boards with only 2 DIMM slots, this is a bit more problematic. Having a single DIMM installed in single channel would probably be slower than two DIMMs in dual channel. And if I fill both slots (2 x 8 GB) now, I would have to replace all of them if I need 32 GB (2 x 16 GB).

(gaurav) #8

This one shows 5.5 GB!

(Christophe Chevalier) #9

Do you have any easy way to generate the same load on a test cluster, that I could use to compare with others?

(Alex Miller) #10

There’s also many memory-related knobs that are all set to default values assuming 8GB as the default amount of space for a process. A brief skimming reveals --knob_server_mem_limit, --knob_commit_batches_mem_bytes_hard_limit, --knob_tlog_spill_reference_max_peek_memory_bytes, --knob_resolver_state_memory_limit, and others.