Question about FoundationDB's Sigmod '21 paper

zydong · May 20, 2022, 10:59am

Hi, after rereading FDB’s paper, I have the question why FDB uses active disk paxos for high availability, rather than a normal multi-paxos or raft consensus protocol?

jzhou · May 23, 2022, 4:19pm

I think this is related how FDB is upgraded. By design, FDB doesn’t have external dependency, thus all configuration data is stored in FDB itself, i.e., in coordinators. This design rules out the design that stores configuration data in an external service, e.g., zookeeper.

To upgrade an FDB cluster, the design is to bounce all processes in the cluster, which is a way to simplify protocol compatibility among different versions. After the bounce/restart, all configuration data is read from disk states stored on multiple machines. So the consensus protocol is chosen to be disk Paxos.

zydong · May 24, 2022, 10:45am

Thank you for your reply! Then another question is why you choose to implement active disk paxos rather than normal consensus protocols, like multi-paxos or raft?

As far as I know, normal consensus protocols also require durability. They can flush configuration data to the disk before restart and read the data from the disk after restart. Therefore, I think the difference between normal consensus protocols and disk paxos is small.

Are there any other reasons encouraging your to choose active disk paxos?

alexmiller · May 24, 2022, 9:19pm

There was an idea that coordinators shouldn’t need to know about each other, nor should they be required to be a part of the cluster that they’re coordinators for. Coordinators learn that they’re coordinators only because some process tries to talk to them as if they’re a coordinator. (A coordinator doesn’t need to have itself as a coordinator in its cluster file to be a coordinator for that cluster.) Coordinators can also be shared across multiple (separate) clusters. If you try to design a consensus implementation where the actual processes participating in consensus don’t communicate directly, you’re basically implementing disk paxos. Active Disk Paxos extends disk paxos in ways that remove bounds and limitations.

But honestly, I’ve really struggled to follow the paper, and never managed to match it up against the actual paxos implementation in FDB. Active Disk Paxos was what we went with because when talking with the person who wrote it many years later, he said “oh, yeah, that sounds about right” , but it’s totally possible that if someone invested the effort to match the implementation against the paper there’d be notable deviances from what the paper outlines.

In the decade since all these decisions were made, I don’t think I’ve seen anyone deploy a shared set of coordinators, or for deployment reasons can’t have coordinators know about each other. I can sort of see how a decade+ ago when bare-metal deployments were very common, being able to say that you can run one set of very widely geographically distributed/replicated coordinators and re-use them for all your clusters sounded like a good feature, but the reality of deployments (and protocol version compatibility limitations) hasn’t worked out that way. So if this were all rewritten today, I don’t think there’s a reason to strongly prefer disk paxos over leadered or leaderless paxos. But there’s also no reason to rewrite the paxos implementation that currently exists, because it works just fine.

zydong · May 26, 2022, 10:11am

Got it!

Your reply completely solve my questions. Thanks a lot

Topic		Replies	Views
How FoundationDB inner working relates to Paxos and Raft? Development	2	2721	January 18, 2019
Technical overview of the database Using FoundationDB	26	12993	January 11, 2019
Write-availability of FoundationDB Using FoundationDB	4	976	April 19, 2021
How should I choose coordination servers? Using FoundationDB	4	2155	April 24, 2019
Using FDB clusters as 2PC participants Using FoundationDB	2	942	January 25, 2019

Question about FoundationDB's Sigmod '21 paper

Related topics