I have created a tuning advice post on my blog based off of my (limited) experience, advice on this forum and from reading the source code: https://semisol.dev/blog/fdb-tuning/
Please share your thoughts, and experiences at scale that I could add.
I have created a tuning advice post on my blog based off of my (limited) experience, advice on this forum and from reading the source code: https://semisol.dev/blog/fdb-tuning/
Please share your thoughts, and experiences at scale that I could add.
Thank you for the information. I got some useful tips, but I was also left confused by a number of points because the recommendations didn’t always come with the reasoning behind it.
You are usually sharing TLog and SS on the same disk, making any impact worse. Non-local disks are still not recommended for larger clusters.
I cannot comment on writes, but I can push over 300MB/s reads on a cluster with 3 nodes, 4 SS/disk (2 disks), + 2 stateless + 1 TLog/disk
You should be easily able to achieve those writes with 3-5 nodes, and a dedicated TLog disk.
The reason to split is when you get after a certain point, you can’t keep a good chunk of the roles on a tight cluster of nodes, and it is more performant to split in that case.
This is performance, cost, and reliability. You can compensate for the lack of RAID with a higher replication factor; this is also better as the probability of failures looks like single node single disk > single node all disks > multi node single disks
Many things like PSU/power failures, kernel state corruption, or hardware issues could mean your disk is dead or otherwise corrupted with junk.
Cost is obviously better as instead of doing double with RAID 1 (4 disks) you can do triple with no RAID for much cheaper (3 disks).
Performance is better as well, because if one disk experiences worse performance, it only impacts SSes on that disk instead of all SSes sharing a RAID.
TLogs cannot benefit from RAID 0, it is primarily sequential-write-only (not a lot) with fsyncs being the main bottleneck. They benefit from enterprise drives more which have good fsync performance due to PLP.
RAID 0 could even bottleneck it more, as the fsync time on a RAID 0 is the highest of both disks.
You need 1 core for a stateless process, it is the highest it can use. Do not count HT “cores”, though keep HT on as there is the kernel + some background processes.
Yes. This is influenced by:
You should not share TLog disks with SS unless you have a low writes-per-read ratio and you are in a small cluster. (This is why I do it)
FDB already batches I/O to some extent. It is best to tune the amount of SSes depending on your load.
If you experience a very high run loop utilization on your SSes, try seeing if you can get higher performance by adding another SS to a disk disk, if it overall can process more, good, if it can’t, that is the saturaiton point.
For many new enterprise NVMes with Redwood, 4 works pretty well.
Redwood as a storage engine is currently unmaintained but it is stable. It can achieve very high range clear throughput, and the highest read performance, along with being somewhat good at compressing common prefixes.
RocksDB, being an LSM-based design, is not as good at range reads. It mostly has benefits in terms of storage usage efficiency if you want to squeeze every last GB.
Hi,
I suspect that my client does bottleneck, but there is no way to retrieve fdb_database_get_main_thread_busyness in Go client.
And you mention “You should be using multiple client threads by using the multi-version client.”. I was considering using this in my use case, but when reading the documentation, I assumed it is used when different client versions are used.
I’m using Go client, and the doc states:
// Spawns multiple worker threads for each version of the client that is loaded. Setting this to a number greater than one implies disable_local_client.
//
// Parameter: Number of client threads to be spawned. Each cluster will be serviced by a single client thread.
func (o NetworkOptions) SetClientThreadsPerVersion(param int64) error {
return o.setOpt(65, int64ToBytes(param))
}
So should I load the client via fdb.Options().SetExternalClientDirectory and then SetClientThreadsPerVersion to have more network threads?
I have a patch that supports this that I will hopefully be able to open source
Each client version also can get a thread of n pools, with SetClientThreadsPerVersion. This is assigned in a round-robin fashion to a thread at a time every time you create an fdb.Database, so you should create first n clients for DB 1, n for DB 2, if you do multi-cluster.
And you need to handle balancing transactions between clients yourself.
Yes