You could try deploying more storage processes onto fewer disks. It looks like you’ve got one storage process per disk right now, and you can’t reach peak IOPS with only one storage process writing at a time. I don’t think this is spelled out anywhere in the documentation, but it is worth a shot.
If you look at the benchmark numbers in the documentation, you’ll see multiple storage processes per disk. In some cases the machines only had a single disk and every process was writing to it.
Just to show my work a bit here:
It sounds like the steady state is somewhere between where you ended up (80kHz) and where you started (180kHz), so let’s go with 100kHz because the burst write absorption capacity is significant in FDB.
If only 4 nodes are running storage processes, and each has two disks, and each node has one disk per storage process, that’s a total of 8 disks and 8 storage processes. If you divide that capacity evenly since it sounds like you’re not writing data sequentially, that’s 12.5kHz per process (which is also per disk).
Source (66% read IOs, 33% write IOs, which is not identical but useful as a model of expected performance)
Notice that even NVMe SSDs require more outstanding IOs to reach their peak IOPS. You’re also going to be doing some read IO on the write path since this is a b-tree and not an LSM.
I don’t know the size of your writes, nor do I know the exact specs of the disks you’re using, but it sounds like you’ve gotten exactly the performance I would expect.
If you add more storage processes you’ll have more outstanding IOs, which should lead to higher throughput.