Seeing lots of rebalancing after fleet wide restarts

rmanoj · May 26, 2021, 9:25pm

We operate one of the largest FDB clusters. We happen to restart our cluster fleet wide for a kernal update. Since then, we have seen long stabilization periods. Specifically there was a long rebalancing occurring between the nodes. Has anyone who has run into this ? Any explanation of why this was happening and how to avoid this long stabilization ?. We would expect rebalancing to occur on adding or removing a node. This led to the storage queue bumping to above 1Gb. We have the following knob set changed from default. We changed custom shard size of 100MB to the default of 500MB
# Knobs
knob_max_shard_bytes = 100000000

mengxu · May 31, 2021, 6:39pm

When shard size changes, data distribution will reshuffle data to create shards following the new size. For example, when you increase the size from 100MB o 500MB, existing shards will be merged to larger shards. Two to-be-merged shards can sit on two different hosts; Merging them requires relocating them.

Topic		Replies	Views
Monitoring/Controlling Data Rebalancing Running FoundationDB	0	380	September 2, 2020
How to speed up balancing? Using FoundationDB performance	11	1523	August 21, 2019
Repartionning after storage server (ss) restart Using FoundationDB	0	25	November 2, 2024
Rebalancing not happening on read hot shards in 7.4.43? Using FoundationDB	3	67	April 2, 2025
Unexpected repartitioning of the database Using FoundationDB performance	5	1024	July 9, 2020

Seeing lots of rebalancing after fleet wide restarts

Related topics