I’m attempting to benchmark FoundationDB for a write heavy workload*, and I’m struggling to max out the storage engine(s). I’m wondering if there are known limitations to transaction / operation throughput per cluster and / or if there are places I could start looking into to potentially contribute changes to improve that throughput.
I’ve followed a number of the performance related threads here, so here is a lot of detail of what I’ve come up with thus far.
I’ve been comparing the memory storage engine on r5d.2xls to the redwood engine on i3.2xls. The performance gap between the two seems pretty small, maybe 10-20% more throughput for the in-memory engine. For both the in-memory and the ssd, the rate limiter doesn’t seem to report the storage engines getting backed up at all. I do see issues with initiating batch priority transactions listed. With the i3s, I’m able to see a range of about 20-30k iops. The AWS documentation indicates those instance types can get up to 180k iops, so there should be a good deal of headroom. The write queue shows a lot more variance going from ~5 to ~2k.
Drawing inspiration from this topic, I’ve explicitly been laying out the processes. Each box has four FDB processes. Storage boxes have four storage processes, and the logs boxes have one log process, one proxy process and two resolver processes. The cluster_controller and master are added as a fifth process to two random logs boxes. After a bunch of different configurations, I ended up with 128 storage boxes (512 processes), 32 logs, 32 proxies and 64 resolvers. I’m writing to the cluster from 32 different processes.
Data seems reasonably well distributed. I don’t see any outliers on any system metric. Network traffic stays under 80 MiB/s, so I don’t think we’re getting limited there. CPU/load is pretty low for the log servers. For the storage servers, load ranges from 0.3 to 0.6 for the storage servers, and CPU usage is between 20% and 40%.
The keys are on average ~100 bytes, with a range between about 50 and 230 bytes. The keys are made up of one of two unique 5 byte prefixes, followed by a few thousand 8 byte integers (sequential from an RDB), a few (usually) English words and a random 16 byte suffix. I’ve varied the number of keys written per batch and gotten the best performance around 200.
* I’m aware that FoundationDB isn’t the best fit for some write heavy workloads, but I have yet to see another open source ordered key-value store work more efficiently. I am also optimistic by the idea of adding a RocksDB backed storage backend assuming we can get the transaction throughput we need.