Region failover: the criteria for auto failover and incremental/full refresh

lehu · November 23, 2021, 8:58pm

We deploy our FDB clusters in 2 regions, with region1 consists of DC1 and DC2 (or AZ1 and AZ2) and region2 with DC3.

The company is mandating more frequent OS patching for the underlying physical machines, upon which our FDB Kubernetes pods create. Currently we employ the following approach to accommodate OS patching for large FDB clusters (>100 pods at DC1):

Shutdown containers at DC1 (let pods to run a dummy image).
FDB would fail over to DC3 and run in one region/DC mode.
Let the Kubernetes team run OS patching on nodes at DC1 in parallel.
After patching finishes, bring up containers at DC1.
FDB will restore to 2 region mode.

Questions regarding the auto failover process:

What will happen, if in Step 3 above, some pods in DC1 actually revived? Will FDB recruit them as part of the cluster? Will FDB fail over back to DC1 if enough pods have revived (we want to avoid this during step 3)?
What are the criteria FDB uses for auto fail over? 50% of nodes bad?
In Step 5, FDB will do a sync from DC3 to DC1. What criteria does FDB use to initialize an incremental sync vs. a full refresh? Elapsed time, amount of transaction logs?

Thanks.

Topic		Replies	Views
Full vs. incremental resync when usable_regions config is changed Running FoundationDB	2	554	January 14, 2021
Simulating FDB data center failure Using FoundationDB performance	4	964	December 9, 2019
What is the failover critera for multi region configuration Development	0	214	November 29, 2022
Run FoundationDB cluster on multi Kuberbetes clusters Kubernetes Operator	20	1744	February 6, 2023
Image upgrade in FDB cluster Using FoundationDB	0	200	June 23, 2023

Region failover: the criteria for auto failover and incremental/full refresh

Related topics