Repartition from cluster expansion results in uneven data distribution

amanda · November 5, 2020, 2:06am

Hi, I’m trying to debug uneven data distribution from my cluster. I have a cluster that initially had 4 storage servers, all on one data drive, and then last week I added 4 new storage servers on a new drive. Ever since it has finished repartitioning, the stored_bytes is higher on the 4 new servers. This makes the IO load much heavier on the new drive.

One thing I’ve also been looking at is the slow reclamation of space after expanding a cluster, so yesterday I made spring cleaning vacuuming more aggressive on the old storage servers. Since doing that I’ve noticed the stored_bytes start to even out between the services. (As in, data from the newer servers started moving back to older servers) I started looking at DataDistribution to see how space taken by sqlite free pages contributes to the utilization of each server, and I don’t think I’m reading this correctly.

I’m looking at the getLoadBytes logic: https://github.com/apple/foundationdb/blob/a9366f39b59453ee0bf0d4e08c7a556b62ec898f/fdbserver/DataDistribution.actor.cpp#L221-L236

Where load is defined by

(physicalBytes + (inflightPenalty*inFlightBytes)) * availableSpaceMultiplier

My understanding is:

Physical bytes: Sum of size of all key/values owned by this team (https://github.com/apple/foundationdb/blob/a9366f39b59453ee0bf0d4e08c7a556b62ec898f/fdbserver/StorageMetrics.actor.h#L388) I am using single replication, so I assume every storage server is its own team.
Available space multiplier: (Assuming amount of space left > available space ratio cutoff) avail space cutoff / ratio of avail space. So it reduces the load if it has a massive amount of available space left. And available space seems to include space taken by free pages (https://github.com/apple/foundationdb/blob/master/fdbserver/KeyValueStoreSQLite.actor.cpp#L1973)

So for the two metrics it uses, available_bytes and stored_bytes. Neither should (?) be affected by vacuuming, since sqlite free pages vs free OS space both contribute to available_bytes. Then why did it start evening out after I started vacuuming? And at some point the servers had reached equal data distribution, why did it keep streaming data to the newer servers at that point?

On a side note, I’ve noticed that in the case of cluster expansion, basically no disk space was being reclaimed using default spring cleaning settings (even for weeks/months following the expansion). I set VACUUMS_PER_LAZY_DELETE_PAGE to 1 and it started reclaiming space at a steady pace. Why is it set to 0 by default?

Topic		Replies	Views
Data distribution / Disk usage uneven: bifurcated at 2 tiers Using FoundationDB	6	528	August 6, 2020
Data distribution and rebalancing Using FoundationDB	3	1302	October 28, 2023
The relationship between uniform distribution of data and the number of storage units Running FoundationDB performance	4	190	April 19, 2024
Storage servers 95% full - how to recover Using FoundationDB	8	1556	May 1, 2024
Data Distribution Stopped - How to Restart? Using FoundationDB	13	1854	November 12, 2019

Repartition from cluster expansion results in uneven data distribution

Related topics