Hm, I’m not sure which is better design starting from tabula rasa, but I believe the heartbeat idea should work and doesn’t require an FDB API change.
In the Record Layer, there’s a SynchronizedSession
class that is supposed to handle some of those details: https://github.com/FoundationDB/fdb-record-layer/blob/master/fdb-extensions/src/main/java/com/apple/foundationdb/synchronizedsession/SynchronizedSession.java
The way it works is there are two keys: (1) one with a UUID indicating the current holder of the lock and (2) another containing the lock’s timeout. Then to do things, the lock holder reads the UUID key, verifies it is the same, and updates the heartbeat with their timestamp plus some lease period. Observers read the UUID key and the timestamp key, and if the timestamp is less than the lease period expiration time, then they assume the lock holder is still alive. If their own time is greater than the timestamp, then they try and claim the lock by writing their own UUID into the key and updating the timestamp.
Note that in this design, it is the value of the UUID key that truly determines the lock holder/leader, not the timestamp value. This means that in the presence of clock drift, a leader is still chosen and there is consensus on who holds the lock. (I don’t think this in our implementation, but you could even relax this protocol by reading the UUID key at SNAPSHOT
isolation level if you are okay with there temporarily being two leaders.) Using the timestamp is just a performance optimization.
That being said, there are problems that can arise with clock drift. For example, if one process’s clock drifts way ahead and then writes a timestamp way in the future, no one will be able to claim the lock until that time. This can be ameliorated by adding “escape hatches” that allow an operator to manually remove a lock, but it’s not awesome.
I believe our backup tooling does a similar thing, but it uses database versions instead of wall clock time. Maybe @SteavedHams can correct me, but I believe that that’s what this line is doing: https://github.com/apple/foundationdb/blob/a4f12a19a3a24bd68676bef8e629d6e855c4702c/fdbclient/TaskBucket.actor.cpp#L223
That ameliorates some of the problems with clocks, but it depends on knowing how versions correlate with time, and that relationship may change in a future release. (Or in other words, the current behavior should not be considered part of the API.) Or at least that’s advice we’ve given in the past.