Cluster tuning cookbook

alexmiller · June 22, 2018, 10:47pm

Each fdbserver process is single-threaded, so it’s generally a good idea to run as many fdbserver processes as roles that you’ll end up recruiting (and extra fdbserver processes will end up just being idle). I’d suggest only pointing one fdbserver process per machine to the disk that you have though. (Unless the one process isn’t enough to saturate your disk, which is sometimes the case. Then point two per machine to the disk.)

Recruitment isn’t always the best at assigning roles in a way that benefit performance the most. (Partly due to lack of intelligence, partly due to lack of topology information.). For our own performance tests, we explicitly assign each process a process class that’s restrictive enough the strongly guide recruitment to a good recruitment.

In particular, I’d recommend making sure that…
(1) Each proxy is running on a different host
(2) The transaction log isn’t sharing a disk with a storage server
(3) Assign an explicit process to server as the cluster controller, and another to serve as the master

As you have eight hosts, you can probably lay out a process configuration that looks like:

1: proxy storage
2: proxy storage
3: resolution storage
4: log
5: log
6: storage cluster_controller
7: storage master
8: storage

Assuming that you’re targeting a proxies=2 logs=2 config.

The goal of this layout is to consider the network bandwidth limits that are available, and maximizing the possible throughput. Commits come in to the proxies and flow to the transaction logs. By having a dedicated network link for both proxies and both logs, we make sure we haven’t accidentally cut our possible bandwidth in half. By assigning explicit process classes for each role, we make sure that we aren’t competing with another role recruited in the same process for CPU time, which would noticeably increase latency. Assigning only one storage role per disk means that we aren’t competing for IO bandwidth (or fsync’s).

I’d suggest looking at the total CPU, network, and disk usage on the relevant machines during your test to give you a hint on how to further tweak the best logs and proxies setting for your config. It might make sense to run 2 proxies on a host if your one proxy process is very high on CPU usage, but not fully utilizing the network, or running an extra log might turn out to increase your throughput.

Topic		Replies	Views
Performance tuning and availability Running FoundationDB performance	0	365	June 10, 2022
Why doesn't my cluster performance scale when I double the number of machines? Using FoundationDB performance	20	3262	August 17, 2018
Cluster scaling Using FoundationDB performance	0	267	November 20, 2023
Understanding about testing class and performance tuning in Foundationdb Kubernetes Operator performance	2	586	February 15, 2023
Transaction/operation throughput Using FoundationDB performance	10	1943	January 23, 2020

Cluster tuning cookbook

Related topics