How to troubleshoot throughput performance degrade?

rayjcwu · June 12, 2019, 8:37pm

Thanks for the reply. We currently create a raid 0 array on all SSDs and run FoundationDB on it. In progress on changing the code to let each log/storage process run on their own disk and own physical CPU.

the 211 MB/s logical write rate seems higher than I would have initially thought to be sustainable.

I’m actually not sure how to estimate the disks requirement to sustain our traffic.

I tried something similar to this approach. In single redundancy mode, having 1 log 19 storage, log process has durable_bytes.hz around 75MB. Having 1 storage 19 log, storage process has durable_bytes.hz around 15MB. (Even though in that post it used mutations, but I don’t know how to use that number). So in our case, the ratio of log:storage processes should be 1:5.

To sustain 633MB/s physical traffic, I’ll need around 10 log processes, and 50 storage processes, each with their own disk. Which means a cluster of 60 disks for triple redundancy mode.

Does the above estimation look correct?

However, If I use dd on the single disk without any other things, the write rate could be 300+MB/s (depends on the blocksize). Does that mean I am able to run more than one log/storage process per disk?

StorageServerUpdateLag indicates that storage server is in e-brake,

Thank you. By searching on GitHub repo, it’s the only one e-brake event.

Topic		Replies	Views
Transaction/operation throughput Using FoundationDB performance	10	1992	January 23, 2020
Storage Server CPU bottleneck - Growing data lag Using FoundationDB performance	22	3049	December 13, 2021
CPU limited storage processes Using FoundationDB performance	9	1550	May 18, 2021
Storage queue limiting performance when initially loading data Using FoundationDB	10	2760	October 14, 2019
Production optimizations Using FoundationDB	20	6436	August 15, 2018

How to troubleshoot throughput performance degrade?

Related topics