Production Hardware

Hi
We are investigating running foundationdb in production on baremetal.
reading through his forum and the docs we can’t really find recommended specs for servers.

If we have a scenario of about 5k-10k writes per second and 10 range reads per second
what kind of hardware should we look at

  1. How many CPU cores are recommended per server?
  2. Should it always be one storage process per SSD device or can we have multiple based on testing?
  3. Is it possible to run tx log process on same server as storage process as long as using different devices?
  4. How much RAM is recommended per process?
  5. If we want to run triple redundancy, should we have at least 5 or 7 servers?

Thanks

1 Like

I don’t think there’s any particular recommendation here. You could run a lot or a little as long as you have enough disks, memory, network bandwidth, etc. to support them all.

You can run more than one storage process per disk. I think this can lead to some fragmentation issues that we’ve previously seen manifest as poor performance when the data files get deleted (a rare event, and something that the software tries to deal with now by slowly truncating the file rather than deleting it outright). If you are CPU bound on the storage servers and have the cores to spare, running extra storage processes can be a reasonable choice. You should be careful though about running more processes that you have physical cores. If you end up saturating the machine CPU, that would be bad for performance.

I should also note that I think you’ll want to run the same number of processes across all of your disks. This is because data distribution tries to evenly distribute data to all of the processes, and if some processes have a whole disk and some only have half, the processes with a whole disk will be artificially limited.

It can also be a bit of a challenge to change the number of processes you have running on a disk if you have a lot of data. For example, if you are changing from 2 to 1 processes per disk, the processes that remain are going to grow larger as the extras are removed. However, until the extras are actually removed, the growing processes might run out of space. The safest way to deal with this if your disks are somewhat full is to migrate to entirely new hardware that’s setup in the way you want.

A transaction log that shares a disk with a storage server is bad for performance, but if they use separate disks it’s fine to have them on the same machine.

The quoted minimum is 4GB, which may work ok if your load is light. I’d generally favor having 8GB per process though.

For triple redundancy, you’ll need at least 5 separate machines for your coordinators. You also need at least 3 machines to recruit your transaction logs, so having 5 affords you two faults for that purpose as well. Having more can be nice because it reduces the urgency of dealing with certain types of faults, but it isn’t necessary.

1 Like