I am building a layer which provides the ability for clients to take locks on logical keys. A correct distributed lock needs to fulfill both safety and liveness (good introduction: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html).
Skip to the end to see my specific question if you’re not interested in reading my detailed locking procedure
Safety is easy with FoundationDB: use the committed version of the database for fencing and rely on the transactional nature of the database for the rest. I understand this version increases at roughly 1 million per second, which I have demonstrated to myself locally on my machine too.
Liveness is where I see a larger challenge.
The layer will be implemented as a process co-located on the FoundationDB nodes (or on machines in the same datacenter) and speaks the Redis protocol because essentially every programming language has at least one client library for Redis. Clients will not talk to FoundationDB directly, so locking needs to be implemented at the layer level.
Here is my idea for liveness:
- Clients connect to the layer process on any machine it is deployed to and request a “lease token” which uniquely identifies their session within the system. All locks will live under this lease. The lease specifies a timeout in milliseconds after which it should be considered expired without a refresh. The lease key is written to the database along with the read version of when it was created as the value.
- The client periodically sends a refresh command to the layer. The layer updates its knowledge of this time using the system’s monotonic clock API. It additionally updates the lease key in the database with the read version of the transaction doing the update.
- The client attempts to acquire a lock with a lock command specifying the lease ID as well as the logical key it would like to lock. If the acquisition is successful, the lock’s key is set to the value of the lease ID which is holding the lock. The request returns the committed version of the database at which the lock was acquired. If the lock acquisition failed the layer returns -1 to indicate a failure. Other information could be included with the failure like which lease currently holds the lock and when it would expire without any more renewals.
- The client makes requests to other parts of the system using that lock and can use the committed version of the database as a fencing token for writing to external systems.
- Any layer processes attempting to acquire a lock will examine the version of the lease that holds a lock if it is held. It will then expire any leases within the same transaction if the leases are expired.
That’s the happy path. Here are a few (non-exhaustive) failure cases:
- The layer process crashes, resulting in the connections to all client processes being closed without an opportunity to cleanly release all the locks.
- The client process crashes, resulting in the client’s connection being closed without cleaning releasing any lock held by the client’s lease.
- The layer process loses the connection to FoundationDB, but still has connectivity to the client.
- The layer process and client process crash simultaneously.
Here is how I would handle each one, respectively:
- The client retains the lease ID and can send it to a different layer process after it reconnects. The layer process can examine the lease ID and the corresponding version of the database to determine if the lease should be considered expired. My direct question about this is at the end of the post.
- The client durably stores the lease ID if desired (and proceed similar to solution 1.), or can ignore any prior leases and create a new lease. This means the client essentially starts over and the lease will not receive any priority over any other leases attempting to acquire or hold the lock.
- The data the lock is protecting as well as the locks themselves will be contained within FoundationDB, so losing connection to FoundationDB will not allow the client to make progress, nor update the client’s lease, so the client should close its connection and attempt to reconnect to a different layer process (proceed similar to 1. and 2.).
- The client would attempt to reconnect to a new layer process when it restarts (proceed similar to 1. and 2.).
Do you see any holes in this procedure?
Specific question portion:
I have read elsewhere on the forum that the version of the database can proceed much faster or slower than 1 million per second for multiple reasons. Because of that, I would like to use the layer’s local monotonic clock during non-failure scenarios to override the database version if the layer detects a large deviation in what would be expected at 1 million versions per second. I would like this to be purely a performance optimization and not a correctness violation, obviously. If each layer process can detect this without any coordination other than reading the database version (which is monotonic and a rough timer itself), is it still safe?
If you read the entire locking procedure, do you spot an logical issues with either safety, liveness, or any obvious performance problems? There would be potentially tens of thousands of leases covering hundreds of thousands of locks with lease timeouts being in the neighborhood of 5-10 seconds depending on the tolerance for downtime of processing during a failure. Even at the high end of what I’ve imagined, I don’t think the system should result in more than a couple thousand transactions per second to manage the locks.
I would also appreciate any experience from existing FoundationDB users on implementing locks at the layer level if that has been done.