Why many clientThreads will cause fdbserver(stateless & grv_proxy) CPU too high?

ivan · July 8, 2025, 1:16pm

In my production environment, I am using FoundationDB version 7.2.5.

We have developed an application which use github.com/apple/foundationdb/bindings/go to operate FoundationDB：

networkOptions := fdb.Options()
err := networkOptions.SetExternalClientLibrary(cfg.FDBLibPath)
if err != nil {
	panic(err)
}
err = networkOptions.SetClientThreadsPerVersion(int64(cfg.NetworkThreadCount))
if err != nil {
	panic(err)
}

we deployed the application on 5 hosts(that is 5 standalone instances) ，and made a Stress test(just simple read transaction)，then we observed the following:

the CPU usage of each foundationDB grv_proxy reached 95%
the CPU usage of each foundationDB stateless(Transaction log server) 60%

then we increased the grv_proxy from 13 to 30, and we observed the following:

the CPU usage of each foundationDB grv_proxy reached 80%
the CPU usage of each foundationDB stateless(transaction Log) reached 85%

I have tried many methods but still can’t decrease the CPU usage, but in one day I suddenly found that decrease ‘ClientThreadsPerVersion’ from 10 to 5 can significantly reduce CPU usage from 95% to 58%, so my question is:

Why increase grv_proxy number can not significantly reduce CPU usage of foundationdb grv_proxy server, and has a side effect that cause increase the CPU usage of transaction log server?
Why decrease ‘ClientThreadsPerVersion’ can significantly reduce CPU usage?

Semisol · July 14, 2025, 6:57pm

FYI, stateless processes cannot be transaction logs, but can be transaction proxies, which is what you are seeing.

You have too many proxies. Usually 2-3 should be enough.
When you start a read transaction, there is a GRV-request. Each request will contact all commit proxies (I believe), which are the stateless processes you have. This introduces a quadratic cost that scales with your GRV and commit proxies.

Why decrease ‘ClientThreadsPerVersion’ can significantly reduce CPU usage?

Clients do their own GRV batching as well.

Have you tried with 2-3 GRV proxies and 1-2 commit proxies?

ivan · July 16, 2025, 3:37am

First， Thank you.

Have you tried with 2-3 GRV proxies and 1-2 commit proxies?

Currently have not tried。 In my production environment, the expected data volume is 500 TB, with an anticipated KV read QPS of 300 K and KV write QPS of 100K. Now I’m planning to run a stress test, do you have any recommendations for the number and ratio of GRV proxies and commit proxies?

Semisol · July 16, 2025, 4:32am

For ideal performance you’ll need to do a bunch of tuning.

Writes

The FDB write path works by your client sending the transaction to a commit proxy. The commit proxy will then forward the conflict ranges to the resolver, which will then either allow or reject the transaction depending on if there is any conflicts.
From there the transaction is sent to the transaction logs. (all of these processes make up the “transaction system”)

This is my advice on how to configure how many processes you run:

Do not put tlog processes in the same servers as commitproxy/resolver.
- This can cut your throughput significantly, as you are putting two bandwidth-intensive loads on the same host, and not fully utilizing cluster capacity.
Each tlog process should have its own disk not shared with another tlog or storage process. Stateless processes are fine.
- This will cause increase in flush latencies, which will slow down transaction processing.
Put your resolution and commitproxy processes on high CPU performance hosts, and have enough bandwidth to handle the load.
- You will always want a few more processes than you configure your cluster for, in case of failures. You should be able to tolerate at least 1 host failing without performance degradation.
- Put resolution and commitproxy on separate hosts if you can.
If you cannot allocate dedicated capacity, you can put the resolution and commitproxy processes alongside storage servers.
You should put some stateless processes alongside every storage server and/or TLog. These handle misc. tasks such as the cluster controller, ratekeeper, and GRV processing.
- You usually should not need dedicated process for GRV unless you have an extreme workload.

And how many to recruit:

1 resolver unless absolutely necessary.
- Adding more than 1 resolver will increase the risk of bogus conflict rejections, especially on transactions that read a lot of key range
- Usually, it is not the resolver. Even if it is, this can be a sign of a bad workload in general, such as a very small hot key range.
A few GRV proxies
- I do not have much advice here myself, but 3 should work.
The count of transaction servers depends on your load and storage servers. Observe the disk utilization %, main loop utilization % and network usage.
Commit proxies also depend on your workload.

Coordinators

No requirements here.

Storage servers

You should use Redwood for the storage engine.

Depending on if your workload has semi-predictable access patterns, try increasing/decreasing cache configuration.

If you have a hot read range, such as metadata about the DB, you should use the rangeconfig command to increase its replica count.

You can usually run 2-3 storage class servers on the same disk to saturate the performance.

Side note, I have not seen anyone run a 500TB FDB cluster. Biggest I have heard about was about 200TB.
Could you provide some specifics about the workload/use case?

fuCtor · July 16, 2025, 10:46am

@Semisol maybe you can suggest. We have

9 baremetal server (120CPU, 512Gb RAM, 2x3Tb NVME + 1 Raid SSD for system)
FDB version 7.1.67
on separated disk we run 7 tlog, 7 storage, 4 stateless
redundancy mode three-data-hall
current workload 10K TPS (we want increase it) on write (1R + 1W ops), value size about 4Kb
52 Logs, 16 commit proxy, 4 GRV
GRV and commit proxy has utilization 80+%
each commit proxy has throughput about 250Mb/s

How we can increase hardware utilization? Maybe scale storage or TLog?

Semisol · July 16, 2025, 5:05pm

I assume you are putting 7 TLogs on one disk + 7 SS on one disk + 4 stateless on one disk.

You can put stateless processes on the same disk as TLogs/SS, since they don’t actually save anything except maybe some metadata on startup.
You should not exceed 1 TLog per disk. Even if you are forced to put SS/TLog on the same disks due to a tiny cluster.
More than 3-4 SS per disk is likely to cause contention, as that many processes should saturate any disk.
Recruiting 52 logs is too much. 3-5 logs should be easily able to handle about 100MB/s of transactions.
Same for proxies. 3-5 proxies should be enough. My advice here is you assign each host SS, stateless, and one of either 1 TLog process/disk or multiple commitproxy/resolution processes.
Also, how are you checking utilization? You may have multiple roles allocated in one process, check status json
If you can, add a small disk, like ~256 GB to hosts for TLogs, or use the boot drive if it has nothing else running on it and it can sustain high IOPS.

fuCtor · July 17, 2025, 8:14am

1 stateless process we put on ssd disk (256Gb)
2 on start we have 3 TLog per disk, and 4 SS per disk (3TL + 4SS per host)
4 we are try decrease it
6 we are checking status json and system metrics, status show about 100% for tlog and ss, but total metrics per disk is much lower then we got by fio test. In this test we received 50k IOPS + 700MB/s, current saturation is 15k IOPS + 5MB/S (TLog)
7 now, we can’t do it, we can request only typical hardware configuration from infra department

PS total workload now:

read ops 37K/s
write ops 10K/s
transactions 10K/s
conflicts 20 ops/s
keys read 82K/s
bytes read 210 MB/s
bytes write 22 MB/s

Semisol · July 17, 2025, 9:19am

You do not need a dedicated disk for stateless processes, or one at all. Allocate that disk to something more useful

What’s the full hardware specs? (individual disks, CPU cores and RAM in GB)

You should have only one TLog per disk. For SS, you should have 3 max. for most NVMes

This should improve performance

fuCtor · July 17, 2025, 9:40am

9 hosts total - 3 host per datahall
3 individual disk: 1 SSD (system) + 2 NVME 3Tb (TLog/SS)
120 Core (2 CPU)
512 Gb RAM

You should have only one TLog per disk. For SS, you should have 3 max. for most NVMes
This should improve performance

We have already planned to reduce the number of TLogs and check your suggest.

Our target is 20K-30K TPS on write and 20+K read TPS (snapshot with GRV Cache).

Semisol · July 17, 2025, 9:55am

Your hosts are oversized. FDB at some point does not gain much from vertical scaling, but horizontal scaling.

I have personally found the sweet-spot for bare metal to be 16 cores and 128GB of RAM, putting 3.84TB disks for SS and 512GB for TLog + un-RAIDed boot drive (can even be a USB!).

Ask your infrastructure department if they can provide smaller hosts.

Topic		Replies	Views
FDB 6.2 - proxies processes have 100% CPU usage Running FoundationDB performance	1	365	October 10, 2022
Cluster tuning cookbook Using FoundationDB	26	8839	February 1, 2019
How to scale foundation db reads Using FoundationDB	20	6411	March 18, 2019
Unable to run more than one commit proxy in a small cluster Running FoundationDB performance	1	383	June 26, 2023
Process class and machine sharing deployment questions Using FoundationDB	15	2598	September 3, 2019

Why many clientThreads will cause fdbserver(stateless & grv_proxy) CPU too high?

Writes

Coordinators

Storage servers

Related topics