Full disk on one machine results in 99% performance degradation

ajbeamon · November 6, 2018, 4:40pm

This is the intended behavior of the cluster. As processes run out of space, the cluster gradually stops handing out read versions, which significantly limits the starting of transactions. This mechanism (known as ratekeeper) is the same as what’s used to slow down clients when they are saturating the cluster. In this case, we are attempting to protect a relatively small amount of space on the disks (the larger of 5% or 100MB), which makes it much easier to recover than if the disk is completely full.

In the case that only one process is running out of space, the cluster still reacts the same way. This is in part because our data distribution process tries to keep roughly the same amount of data on every storage server. You could imagine data distribution trying to be more sophisticated by assigning data based on how much space is available, and perhaps one day it may do that, but in that scenario you’d also probably want to be more careful about making sure the load is appropriately leveled across the processes.

It’s also possible that the cluster could attempt to proactively remove a process that has run out of disk space if the rest of the cluster has ample space available. I think there’s some desire to add a general feature like this (i.e. giving the cluster a budget to remove processes that are behaving poorly for some reason), and this may be a good candidate for one of those reasons.

Topic		Replies	Views
Brand new macOS installation "has issues" Using FoundationDB	12	1540	June 7, 2020
Storage servers 95% full - how to recover Using FoundationDB	8	1551	May 1, 2024
Work around "Storage server running out of space (approaching 5% limit)" on your developer machine Using FoundationDB	2	1886	June 7, 2020
FoundationdDB Cluster Performance Issues Using FoundationDB performance , operator	11	1301	October 21, 2020
FoundationDB cluster performance issue - Periods of high disk I/O and sustained high latency Using FoundationDB performance	21	2519	July 6, 2020

Full disk on one machine results in 99% performance degradation

Related topics