Fault Tolerance changes from "2 machines" to "0 machines (2 without data loss)"

binzhangebay · May 30, 2019, 12:40am

we have one 3 data center ( two usable regions) config FDB cluster.

when usable_regions=1, we see “Fault Tolerance - 2 machines”;
then we change usable_regions=2, we start to see “Fault Tolerance - 0 machines (2 without data loss)”

What’s possible reason that fault tolerance decrease to 0 machine?

fdb> status details

Using cluster file `/var/lib/foundationdb/fdb.cluster’.

Configuration:
Redundancy mode - triple
Storage engine - ssd-2
Coordinators - 5
Desired Proxies - 4

Cluster:
FoundationDB processes - 48
Machines - 36
Memory availability - 163.2 GB per process on machine with least available
Retransmissions rate - 2 Hz
Fault Tolerance - 0 machines (2 without data loss)
Server time - 05/30/19 00:32:27

Data:
Replication health - Healthy (Rebalancing)
Moving data - 0.015 GB
Sum of key-value sizes - 1.405 GB
Disk space used - 11.000 GB

alexmiller · June 5, 2019, 11:05pm

(Sorry, I had half a reply typed out to you, meant to double check with Evan, and then forgot.)

Region-ifyign a cluster is supposed to be a three part process:

usable_regions=1 regions=[...]
usable_regions=2 regions=[...] where regions has the datacenter that doesn’t have a fully copy of the database currently set to a priority of -1.
usable_regions=2 regions=[...] where regions now has a >=0 priority for both datacenters.

What you’re seeing would make sense to me if you went straight from Step 1 to Step 3, as the remote side would be down to a fault tolerance of 0, but the primary still has copies of the data.

It’s also possible that you only have three zoneid’s in the remote DC, so if you lose one machine in the remote DC, you wouldn’t be able to recover to it?

…or it’s possible that there’s a bug in status.

Either way, more details on the exact steps you took and the exact layout and configuration of your cluster would be helpful.

Topic		Replies	Views
Fault Tolerance - 0 zones after setting locality_zoneid Using FoundationDB	8	761	May 19, 2021
Max Tolerable Zone Failures for Availability and Data Using FoundationDB	1	601	June 24, 2020
Fault tolerance numer in status output Using FoundationDB	0	270	February 15, 2023
Why the Fault Tolerance mechanism does not seem to work in my test case? Using FoundationDB performance	0	473	June 24, 2022
Ideal setup for Fault Tolerance = 3 in Triple mode Using FoundationDB	2	1196	July 15, 2019

Fault Tolerance changes from "2 machines" to "0 machines (2 without data loss)"

Related topics