Migrating from a large cluster to another

killertypo · September 24, 2018, 4:38pm

Howdy - you didn’t happen to mention the storage engine you are utilizing (SSD(2) or Memory) but this is something we at wavefront happen to do quite often.

We run clusters operating at millions of writes per second and tens of thousands of reads (depending on load)

We call these operations a Fleet Replacement, as sometimes we need to increase total resources and want to start by dropping a fresh new fleet of hardware that is say 2x as powerful and running a slightly different number of machines.

Now to what you probably care about

While our primary usecase right now is FDB3 (and we are migrating slowly to FDB5 re: How to prevent tlogs from overcommitting) a lot of our techniques for a safe replacement and preventing outages have applied or worked well.

I’d first ask if you have the ability to temporarily suspend workloads in your cluster? Also what kind of IO throughput do you have? and what kind of storage engine are you utilizing?

Typically when we are working with clusters that are between 250 -> 400procs we suppress all workloads and enqueue them to be processed later so that we can join a new host into the cluster without worry of the tlogs becoming overwhelmed and our CC wreaking havoc. We then slowly ramp up the enqueued workload while the cluster rebalances and shifts the load (which is much faster on the memory tier vs. ssd tier, and if you are already IO constrained on the SSD tier your workloads rebalancing will actually be quite costly).

Topic		Replies	Views
Is it possible to enable `perpetual_storage_wiggle` at the 'shard' level instead of the 'process' level? Using FoundationDB	9	115	January 21, 2025
Redwood: perpetuum moving data between storage servers Using FoundationDB	7	782	November 15, 2022
Cluster Controller CPU Utilization Pegged Using FoundationDB	6	939	October 1, 2018
Troubles scaling up the cluster Using FoundationDB	31	3728	November 1, 2018
FDB Regions vs DR performance and usage for creating independent FDB clusters Using FoundationDB	1	64	August 17, 2024

Migrating from a large cluster to another

Related topics