FoundationDB cluster setup

Hi Everyone,

I’m Quite new to foundationdb and want to setup a 3 machine cluster with process classes, what is the required spec and what would be process that should be used for Good performance.

Any help will be much appreciated!


How many disks do these machines each have?

1 Like

How many should I use?

If you can provide

a) the number of machines (which you gave as 3)
b) the number of cores each of those machines have
c) the number of disks attached to each machine (preferably dedicated for FDB)

Then I will provide you a suggested process layout and configuration on your cluster. Otherwise, my answers are just going to depend on me guessing what your environment is.

1 Like

If you’re looking for general advice, searching the forums for “process class” will yield a number of previous threads like Cluster tuning cookbook which offers a lot of detail on recommended cluster setup.

This is a topic that I hope we improve in the official documentation sometime…

1 Like

T2.xlarge, 4 cores, 16gigs, 2 100GB SSD for fdb

Ah, thank you for including memory, which I should have also asked.

fdbserver expects to be able to use up to 8GB of RAM per process, so that’d be 2 fdbserver processes per host.

I think reserving any disk for tlog here would probably be too much of a tax, as it would cut your total amount of usable disk space in half. So I’d suggest just leaving them with process class unset (so that they can become anything), and have one pointed to each ssd. You’ll end up with a warning from fdbcli> status about poor performance due to log and storage sharing a process.

Then fdbcli> configure double and distribute a cluster file with one process from each host as coordinator.

If you ran with a slightly larger host that had 24GB of RAM, you could then do 1xtransaction and 2x storage per host, with one of the transaction and storage processes sharing a disk. Splitting storage out from tlog/proxy/resolver would probably give you a bit better performance, due to not taxing one process quite as hard.

If you need good performance, and care less about the total data volume that you can store, then you’d be better off doing 1 transaction process and one storage process per host. We generally talk about the transaction authority as a 10% tax, so if you started scaling out past 7-8 hosts in this same configuration, I’d take two hosts and convert one of their disks to transaction class.

1 Like

Thanks for the detailed reply Alex!

And for latency test I should use only one instance right?

We have planned to use YCSB for testing. Do you any idea about how to configure this?

I would expect that for benchmarking a three machine cluster, you could probably get away with only having one (other) machine driving the benchmarking load.

Someone did contribute a YCSB backend for FoundationDB. However, YCSB isn’t a great benchmarking tool for transactional databases, and particularly not for FoundationDB. In a transaction, the first operation done pays a cost of fetching and calculating a read version, which is frequently a 1-2ms operation. A transaction can then do any number of reads within 5 seconds, with a point read typically taking ~0.3ms. YCSB does one read or one write per transaction, so it ends up turning into a benchmark of getting read versions instead of reads or writes to the database. See How to scale foundation db reads for a deep analysis of this sort of workload.

So YCSB will give you an idea of how a YCSB-like workload would run on FDB, but not anything that cleanly extrapolates to what your application would do with FDB. If you’d still like to run YCSB, then our friends at PingCAP very generously included FDB in their go port of YCSB also ANN: benchmark FoundationDB with Go YCSB . There are benchmarks that you can find that were meant to be better tests of transactional databases: YCSB+T, TCSB, or TPC-C, but I’m not aware of a FDB backend for any of them.

We’ve previously encouraged folk to use the same workload generating tools that we use for performance tests internally. You can find some examples of doing so in Why doesn't my cluster performance scale when I double the number of machines? and searching for “multitest” gives other threads as well.

1 Like

Thanks a ton Alex. I will go through all these and get backup to you.

Alex. also when does rate keeper slows down the process? i understand that it does after heavy load. but is there any specific metric? after which the ratekeeper slows down everything?

Heavy load is the usual cause, which can be determined from queue sizes among other things. Another possibility which could be what you’re hitting is running out of space on a disk (or in memory for the memory storage engine). Running status in fdbcli should tell you what the workload is limited by, and if you want help interpreting it you could paste your status output here.

1 Like

I don’t have the output now, but when i do bench-marking with YCSB, the bench-marking starts with (for example 800 ops/sec ) in the beginning and slows down when the disk usage crosses 70% (i hope). I was pondering what would be the reason

We have often seen that disk iops themselves slow down as disk fills up. That could be one of the things to check. I don’t see what within fdb could slow down the tps, on disk reaching ~70% fill.