There was a thread that discussed some misinterpretable aspects of that table:
Which also caused some edits to its wording.
Fault Tolerance
will always be, at best, Total Number of Copies - 1
. So Fault Tolerance =2 is the best that you will get with triple
replication.
Also see Optimal configuration for more than 3 DCs and a linked thread therein for a longer discussion on FDB on more than two regions.
Note that FDB is not a quorum-based system. The “a majority of nodes is required” logic only applies to coordinators, which do form a quorum-based system. Otherwise, the number of processes involved in holding a shard of data is explicitly set as part of the configuration, as the table attempted to describe.
I think a 2-region triple
configuration will give you the highest effective fault tolerance right now. If all copies are lost in one region, data distribution in multi-region understands how to copy it back across. You’d be able to theoretically lose 5 machines and FDB would still be able to recover, as long as you run 11 coordinators. I recall the status message wording being a bit weird with multi-region currently, so I think it will still show 2
, but will eventually tell you but (N) without data loss
if you start killing enough things.
I’ve typically seen people plan for 1 planned and 1 unplanned outage, so triple replication and a fault tolerance of 2 has generally been fine for most folk. Any reason you’re extra concerned about copies of your data suddenly going missing or corrupt?