The “Minimum number of nodes to allow fault tolerance” is… misleading at best. I worked through one round of corrections to that page with them before, but some pieces were still disputed. FoundationDB didn’t have great public documentation at the time when their article was written (and it could still be much better). They gave a very respectable attempt at consuming and digesting the information that was publicly available, but it would have been nice if they double checked it with us before posting, especially given that we’d already met.
RF=5 means that you can tolerate 4 failures in FDB, but only that you can tolerate 2 failures in Raft. It’d be far more fair to compare fault tolerance over Replication Factor:
|
F=1 |
|
F=2 |
|
F=3 |
|
|
FDB |
Raft |
FDB |
Raft |
FDB |
Raft |
Replicas |
2 |
3 |
3 |
5 |
4 |
7 |
Which is really just comparing F+1
(FDB) vs 2F+1
(Raft). Most folk run with F=2, so that’s triple
or 3 replicas in FDB, and RF=5 or 5 replicas in Raft.
That table was also trying to show that some amount of extra nodes are required for FDB to be able to reconfigure in reaction to a failure, which is true. The minimum size for a 3 replica cluster in FDB is 5 nodes, so that if 2 of them fail, there are 3 left to reconfigure to. The minimum size of a Raft-based DB for 2 failures is also 5. This is why their table shows FDB: 5
for RF=3
.
However, this is misleading once the size of the cluster increases. FDB requires 3 replicas * number of shards + 2
nodes to survive 2 failures. Raft requires 5 replicas * number of shards
nodes to survive 2 failures. As the number of shards approach infinity, that means FDB requires 40% less nodes, whereas their table presents the opposite view.
The tradeoff for this cost savings is that there’s an unavailability window during the reconfiguration step. In chain replicated databases, this was shard unavailability during reconfiguring the chain. In FDB, this is write unavailability to all shards. It requires a failure of a key transaction subsystem component though, which is typically presented as 10% of the cluster, so a 10% chance of a node failure causing a few seconds of unavailability. Raft-based databases will have temporary shard unavailability if a leader fails, so on a large enough cluster, every node failure will cause a shard of data to be unavailable until a new raft leader is re-elected, but that’s only a small fraction of the shards being unavailable.
I personally view the tradeoff as FDB gives you 99.95% availability and you pay for 3 replicas per shard. Raft gives you 99.99% availability and you pay for 5 replicas per shard. Choose the one you wish depending on the financial value of availability for you.