Let’s say I have 3 FDB clusters, clusters A, B, and C. User data is divided between B and C depending on their physical location and data protection requirements. Cluster A holds only the metadata for determining which users are in which cluster, and isn’t particularly important to the rest of this example, other than to point out there is an external mapping determining which data is where.
User data is self contained entirely into one cluster or the other, but occasionally users interact across clusters. You can imagine a payment application where users can send payments to each other. Most users will interact with users who are near them and therefore in the same cluster, removing the need for any transaction outside of FDB. Sometimes a user will make a transfer to a user in the other cluster, and a sane choice for doing this atomically is 2 phase commit.
I’m not asking how to implement 2PC in general, but rather how to best handle a few parts of the protocol in FDB effectively.
2PC notably only ensures atomicity, so isolation is on the layer and FDB jointly to ensure. This comes in (at least) two parts: hiding uncommitted data from concurrent transactions in either “database”, and ensuring a transaction always can commit locally if asked to after responding to a prepare successfully.
If your define each “database” as a single user’s data in one cluster, you could add a lock key for each user to say “there is a 2PC going on for this user” and have every transaction for that user not participating in a 2PC add a read conflict range to that key. The 2PC transaction in the prepare phase writes to that key before acknowledging. This ensures concurrent transactions for that users’ data block or fail until the 2PC is over, but concurrent transactions for other users do not. You can assume all data for this user is contained within this database, including secondary indexes etc. (so no need to worry about dirty/stale reads or blocking other txns there). Any concurrent, read-only transaction which ignores the lock does it at its own risk, and writing anything at all would violate the rules unless a more fine-grained locking scheme were used.
If all transactions obey this lock, it should be possible to implement the guarantees required of participants by 2PC, correct? From a performance perspective, the per-user lock is not ideal, but it could be improved. This is ideally not an interactive transaction to minimize the time holding the lock.
The other bits are not as interesting or complicated. In the prepare phase, you log all the operations you’d like to do (set this key, clear this range etc) along with the ID of the active transaction. When your transaction logging operations commits, you acknowledge prepare. When you’re asked to commit by the transaction manager, you replay the log of operations and apply the mutations to the local FDB cluster, and because you hold the lock, you can commit knowing no other transactions could possibly conflict with you.
The failure scenarios are also not made any more difficult by FDB (only easier to handle) because the roles of the transaction manager and “databases” in the protocol are not single physical machines, but rather logical entities driven by any stateless layer process.
Questions:
- Is my description of locking and logging safe (as far as you can tell from my probably under-specified words) and did I miss any bits that could be optimized to completely skip work (like not locking at all, which I don’t think is possible)?
- Is my assumption that FDB only makes handling the failure scenarios easier true? I understand it will be slower than if each participant were a physical machine that could operate purely by itself, where a transaction could commit in say <1ms vs 3-4ms minimum in FDB, but I don’t think this makes it any more difficult to implement 2PC correctly.
Sorry for the wall of text! Hopefully this discussion can spark some interest among people who are interested in using FDB as the building block for a global distributed database.