You seem to be running a very write-heavy workload. As the disk doesn’t seem to be saturated, my guess is that the write-queue within FDB is the problem.
The disk interface uses AIO with O_DIRECT, and we only allow to queue 64 operations at a time and reads will get a higher priority. This means that the max throughput is somewhere around 4KB * 64 * disk-latency
- or in other words: your performance might be limited by disk latency instead of disk throughput…
There are a few things you can do:
- If you have enough CPUs, I would suggest to start more storage servers per disk. You need one CPU core per disk - you could also try to oversubscribe and have two storages run on one CPU, but that might give you weird read-latency behavior, which is also weird.
- There’s a knob called
MAX_OUTSTANDING
which controls how many operations fdb sends to the storage. You can try to set this to a higher value.
I think the first suggestion has a higher probability for success, so if you can I would try this first.