My guess is that you’ll probably fall somewhere in 10-15 logs to make it work
I rerun the load testing with 10 * i3.16x (80 disks total) with 10 log and 70 storage processes to 15 log and 65 storage processes. On each disk there is only one log or storage process, and an extra stateless process. In total always run the same number of proxy processes as log processes and about half of the number of resolver processes.
All configurations have similar results. The highest throughput I could reach is around 13K/s, there are always 2 or 3 storage queues way higher than the others. Usually one storage queue is around 900MB and one is at 1.5GB. At the same time log queues are all at 1.5GB threshold. Even if I add more disks, reaching 13 log 107 storage processes, the behavior is the same. From @mengxu’s post I think the bottleneck is at storage processes e-brake.
@ryanworl’s response makes me wonder if the issue coming from our data modeling. We model our key, value with expiry by introducing 4 kinds of auxiliary keys. Majority of them are ‘s’ and ‘k’ keys since each of our logical key has these two FoundationDB keys:
‘k’ key: 'k' + key (N random bytes in [a-zA-Z]) ‘s’ key: 's' + 2 random bytes (chosen by hash of key) + 4 bytes of a uint32 representing a timestamp + key
Is it because most of foundationDB keys start with ‘k’ and ‘s’ so most of the traffic fall on 2 storage processes?
I am concerned though that you’re pushing up against the limits of what FDB can do for write throughput right now
Is this write throughput limit posted somewhere? Our traffic is around 20K/s level and expect growing over time.