Storage node failure test

ajbeamon · October 15, 2020, 4:59pm

One further thought is that even in the absence of a significant increase in client reads, data distribution will be trying to move data from the storage servers that remain in order to re-replicate it. It’s supposed to run at a low-ish speed, but in the event that you are running near enough to the performance limit prior to removing the storage servers, it could push you over the top. You could potentially gauge the effects of data distribution by disabling it during this test. Probably the most surgical way to do that would be to use maintenance mode, which is discussed a bit here:

You could also just disable data movement for all storage server failures by running the following in fdbcli:

fdb> datadistribution disable ssfailure

If data distribution is pushing you over the edge, then I think the immediate options you have available would be to:

Decrease the client workload
Increase the cluster size (and/or maybe increase replication)
Tweak some knobs to slow down data movement (which would result in slower healing)

Topic		Replies	Views
Storage Server CPU bottleneck - Growing data lag Using FoundationDB performance	22	3008	December 13, 2021
Scaling issues with FDB for write throughput Running FoundationDB	6	1826	September 14, 2020
Cluster Performance Issue (7.1.43) Using FoundationDB performance	6	403	January 30, 2024
FoundationDB cluster performance issue - Periods of high disk I/O and sustained high latency Using FoundationDB performance	21	2516	July 6, 2020
Bulkload Performance Testing Using FoundationDB performance	12	153	May 26, 2025

Storage node failure test

Related topics