How to troubleshoot throughput performance degrade?


I rerun the load testing, there were 3 storage processes with higher queue size in the beginning, but this time the third largest queue size shrink pretty soon. In storage processes input_bytes.hz, after running for a while, they basically form in 3 groups. The first group (~= 15MB/s one) are the 3 processes with higher queue size in the beginning. For other metrics you mentioned, all processes emitting metrics in similar range.


However, even if I know these 3 processes write more data than the other processes, how am I going from that? That 15MS/s is way lower than each underlying SSD’s capability, and they are the only one process writing to their own disk.

Our implementation didn’t use tuple layer but directly use byte array as key. For example our s key have 4 segments ('s', 2 bytes of hash(key), 4 bytes of timestamp, key), we use the concatenation of bytes representation of each segment as the FoundationDB key.

Does this consider as “different key spaces” or all are in one key space?

(Meng Xu) #22

They should be splitter to different shards by data distributor.

I’m wondering if you preloaded some small amount of data that pre-creates those shards as Alex suggested?


Basically “stored_bytes” are never imbalanced from the beginning till 2 or 3 storage processes storage queues are way behind during my load testing. Do I still need to pre-load data?


From KrzysFR’s post Evan mentioned

, but since data distribution looks good, that might be a different issue.

(Alex Miller) #24

Hmmm. That would probably suggest that you’re managing to split reasonably.

Can you pastebin a status json so I can double check a couple things?


Here is the “status json” output.