Cluster unresponsive when some of the nodes have full disk

ex3ndr · July 7, 2020, 1:29am

I have misconfigured some of our nodes to use wrong disk and when this nodes became full, our cluster came to a full stop. It reports in fdbcli, but tx commit frequency became only a 1 hz making cluster basicaly dead.

Also since this disks are, unfortunately, HDD it seems it would take a while to exclude them from the cluster.

This behaviour is extremely surprising. Does this mean that we must have a lot extra capacity and all nodes have to use disks of the same size?

UPD: Eventually cluster became “Healthy (Repartitioning)”, but still can’t answer to the queries.
UPD2: Also i have to exclude them forcibly since i am getting “ERROR: This exclude may cause the total free space in the cluster to drop below 10%.”, but that’s not true at all.
UPD3: Removing the one that had zero available space AND restarting cluster (kill all) worked. Adding this node back to the cluster immediately kills the cluster.

gaurav · July 7, 2020, 10:32am

See this thread for effects of disk running out on any of SS

Topic		Replies	Views
Recovery/Reviving a Storage Full Cluster Using FoundationDB	1	32	May 15, 2025
Cluster becomes nonresponsive while ingesting data Using FoundationDB	4	528	September 2, 2021
Troubles scaling up the cluster Using FoundationDB	31	3724	November 1, 2018
30 server cluster just died Using FoundationDB	7	724	June 6, 2021
Fdb cluster is unavailable after delete a disk Using FoundationDB	3	1146	July 9, 2020

Cluster unresponsive when some of the nodes have full disk

Related topics