we deployed the application on 5 hosts(that is 5 standalone instances) ,and made a Stress test(just simple read transaction),then we observed the following:
the CPU usage of each foundationDB grv_proxy reached 95%
the CPU usage of each foundationDB stateless(Transaction log server) 60%
then we increased the grv_proxy from 13 to 30, and we observed the following:
the CPU usage of each foundationDB grv_proxy reached 80%
the CPU usage of each foundationDB stateless(transaction Log) reached 85%
I have tried many methods but still can’t decrease the CPU usage, but in one day I suddenly found that decrease ‘ClientThreadsPerVersion’ from 10 to 5 can significantly reduce CPU usage from 95% to 58%, so my question is:
Why increase grv_proxy number can not significantly reduce CPU usage of foundationdb grv_proxy server, and has a side effect that cause increase the CPU usage of transaction log server?
Why decrease ‘ClientThreadsPerVersion’ can significantly reduce CPU usage?
FYI, stateless processes cannot be transaction logs, but can be transaction proxies, which is what you are seeing.
You have too many proxies. Usually 2-3 should be enough.
When you start a read transaction, there is a GRV-request. Each request will contact all commit proxies (I believe), which are the stateless processes you have. This introduces a quadratic cost that scales with your GRV and commit proxies.
Why decrease ‘ClientThreadsPerVersion’ can significantly reduce CPU usage?
Clients do their own GRV batching as well.
Have you tried with 2-3 GRV proxies and 1-2 commit proxies?
Have you tried with 2-3 GRV proxies and 1-2 commit proxies?
Currently have not tried。 In my production environment, the expected data volume is 500 TB, with an anticipated KV read QPS of 300 K and KV write QPS of 100K. Now I’m planning to run a stress test, do you have any recommendations for the number and ratio of GRV proxies and commit proxies?
For ideal performance you’ll need to do a bunch of tuning.
Writes
The FDB write path works by your client sending the transaction to a commit proxy. The commit proxy will then forward the conflict ranges to the resolver, which will then either allow or reject the transaction depending on if there is any conflicts.
From there the transaction is sent to the transaction logs. (all of these processes make up the “transaction system”)
This is my advice on how to configure how many processes you run:
Do not put tlog processes in the same servers as commitproxy/resolver.
This can cut your throughput significantly, as you are putting two bandwidth-intensive loads on the same host, and not fully utilizing cluster capacity.
Each tlog process should have its own disk not shared with another tlog or storage process. Stateless processes are fine.
This will cause increase in flush latencies, which will slow down transaction processing.
Put your resolution and commitproxy processes on high CPU performance hosts, and have enough bandwidth to handle the load.
You will always want a few more processes than you configure your cluster for, in case of failures. You should be able to tolerate at least 1 host failing without performance degradation.
Put resolution and commitproxy on separate hosts if you can.
If you cannot allocate dedicated capacity, you can put the resolution and commitproxy processes alongside storage servers.
You should put some stateless processes alongside every storage server and/or TLog. These handle misc. tasks such as the cluster controller, ratekeeper, and GRV processing.
You usually should not need dedicated process for GRV unless you have an extreme workload.
And how many to recruit:
1 resolver unless absolutely necessary.
Adding more than 1 resolver will increase the risk of bogus conflict rejections, especially on transactions that read a lot of key range
Usually, it is not the resolver. Even if it is, this can be a sign of a bad workload in general, such as a very small hot key range.
A few GRV proxies
I do not have much advice here myself, but 3 should work.
The count of transaction servers depends on your load and storage servers. Observe the disk utilization %, main loop utilization % and network usage.
Commit proxies also depend on your workload.
Coordinators
No requirements here.
Storage servers
You should use Redwood for the storage engine.
Depending on if your workload has semi-predictable access patterns, try increasing/decreasing cache configuration.
If you have a hot read range, such as metadata about the DB, you should use the rangeconfig command to increase its replica count.
You can usually run 2-3 storage class servers on the same disk to saturate the performance.
Side note, I have not seen anyone run a 500TB FDB cluster. Biggest I have heard about was about 200TB.
Could you provide some specifics about the workload/use case?
I assume you are putting 7 TLogs on one disk + 7 SS on one disk + 4 stateless on one disk.
You can put stateless processes on the same disk as TLogs/SS, since they don’t actually save anything except maybe some metadata on startup.
You should not exceed 1 TLog per disk. Even if you are forced to put SS/TLog on the same disks due to a tiny cluster.
More than 3-4 SS per disk is likely to cause contention, as that many processes should saturate any disk.
Recruiting 52 logs is too much. 3-5 logs should be easily able to handle about 100MB/s of transactions.
Same for proxies. 3-5 proxies should be enough. My advice here is you assign each host SS, stateless, and one of either 1 TLog process/disk or multiple commitproxy/resolution processes.
Also, how are you checking utilization? You may have multiple roles allocated in one process, check status json
If you can, add a small disk, like ~256 GB to hosts for TLogs, or use the boot drive if it has nothing else running on it and it can sustain high IOPS.
1 stateless process we put on ssd disk (256Gb)
2 on start we have 3 TLog per disk, and 4 SS per disk (3TL + 4SS per host)
4 we are try decrease it
6 we are checking status json and system metrics, status show about 100% for tlog and ss, but total metrics per disk is much lower then we got by fio test. In this test we received 50k IOPS + 700MB/s, current saturation is 15k IOPS + 5MB/S (TLog)
7 now, we can’t do it, we can request only typical hardware configuration from infra department
Your hosts are oversized. FDB at some point does not gain much from vertical scaling, but horizontal scaling.
I have personally found the sweet-spot for bare metal to be 16 cores and 128GB of RAM, putting 3.84TB disks for SS and 512GB for TLog + un-RAIDed boot drive (can even be a USB!).
Ask your infrastructure department if they can provide smaller hosts.