Hi, we use foundationDB as our storage server for multiple clients.
In our system, each client access a private range of keys. If one
client failed, system may create a new client for that range, or
choose an idle client for that range.
If the failed client accidentally access fdb again, it may overwrite
changes made by the new client, then cause data consistency problem
(knows as split-brain).
To prevent this happens, we want to implement some kind of fencing
in fdb. After search and learn, we came up with 2 options:
Make a fencing key, and check if it’value has changed in every write txn.
Once replaced a client, update that fencing key to disable the failed client.
This will work, but it may reduce write performance (maybe not so much?)
because there is no server side value check in fdb, have to read it’s
value to the client to check it.
Also make a fencing key, then, in the write txn:
set txn’s read version to the “right version” of the fencing key
add fencing key as read conflict key
do write…
The “right version” is the read version that make sure the fencing key’s
value is unchanged.
To use conflict key, it may have very little performance impact. But to
fix transaction_too_old error, we have to refresh the “right version” often enough.
(Learn from Block device layer, thanks a lot.)
Are there any problems with either method? Are there other, better ways to do it?
Thanks
First approach is simpler and would result in adding 2-3 ms per transaction to fetch the fencing key. You can avoid the penalty if in the client code you don’t wait on the fencing value immediately, but Concurrently continue with other reads/writes, and then later check and abort the tx if the key has changed. If the fencing key changes are rare, this will not result in abortions, and thus a practically free operation.
Second approach, as I understand, is an optimization to cache the fencing key and the associated read version. This will work, but adds some complications where a client’s subsequent transactions may conflict with its older transactions, due to older read_version being used. This also results in bypassing ratelimiter and its ability, to some extent, to throttle down transactions under load.
How frequent are changes to fencing keys? If this is not very frequent then another optimization can be used: There is a feature by which you can get a fixed metadata key as part of every transaction for free.
You can transactionally change this global metadata key when changing any fencing key. Clients can compare changes to this metadata key against a locally cached value, and can then fetch the particular fencing key if the global metadata key has changed.
It seem the second approach may cause some problem. I think we will go for the first approach with checking the fencing key at end of each write transaction. Later, after upgrading to newer version of fdb, we will try to use the metadata key.
For metadata key, if someone else changed it while current transaction is checking it, will the transaction later report an error on commit? (because the metadata key is part of the conflict keys)
I am almost certain it will cause a conflict, but haven’t used this feature so far. You can try it out in a local test.
Just to be clear - I was suggesting to fire a get for the key at beginning of transaction, but extract the value out of future towards end of transaction.