For ideal performance you’ll need to do a bunch of tuning.
Writes
The FDB write path works by your client sending the transaction to a commit proxy. The commit proxy will then forward the conflict ranges to the resolver, which will then either allow or reject the transaction depending on if there is any conflicts.
From there the transaction is sent to the transaction logs. (all of these processes make up the “transaction system”)
This is my advice on how to configure how many processes you run:
- Do not put
tlog
processes in the same servers ascommitproxy
/resolver
.- This can cut your throughput significantly, as you are putting two bandwidth-intensive loads on the same host, and not fully utilizing cluster capacity.
- Each
tlog
process should have its own disk not shared with anothertlog
orstorage
process. Stateless processes are fine.- This will cause increase in flush latencies, which will slow down transaction processing.
- Put your
resolution
andcommitproxy
processes on high CPU performance hosts, and have enough bandwidth to handle the load.- You will always want a few more processes than you configure your cluster for, in case of failures. You should be able to tolerate at least 1 host failing without performance degradation.
- Put
resolution
andcommitproxy
on separate hosts if you can.
- If you cannot allocate dedicated capacity, you can put the
resolution
andcommitproxy
processes alongside storage servers. - You should put some
stateless
processes alongside every storage server and/or TLog. These handle misc. tasks such as the cluster controller, ratekeeper, and GRV processing.- You usually should not need dedicated process for GRV unless you have an extreme workload.
And how many to recruit:
- 1 resolver unless absolutely necessary.
- Adding more than 1 resolver will increase the risk of bogus conflict rejections, especially on transactions that read a lot of key range
- Usually, it is not the resolver. Even if it is, this can be a sign of a bad workload in general, such as a very small hot key range.
- A few GRV proxies
- I do not have much advice here myself, but 3 should work.
- The count of transaction servers depends on your load and storage servers. Observe the disk utilization %, main loop utilization % and network usage.
- Commit proxies also depend on your workload.
Coordinators
No requirements here.
Storage servers
You should use Redwood for the storage engine.
Depending on if your workload has semi-predictable access patterns, try increasing/decreasing cache configuration.
If you have a hot read range, such as metadata about the DB, you should use the rangeconfig
command to increase its replica count.
You can usually run 2-3 storage
class servers on the same disk to saturate the performance.
Side note, I have not seen anyone run a 500TB FDB cluster. Biggest I have heard about was about 200TB.
Could you provide some specifics about the workload/use case?