After having implemented caching in the Directory Layer and several other complex layers, using the \xff/metadaVersion
key, and having used it for a while, I ended up having to scrap this idea for mainly three reasons:
-
Relying on a single global key creates to much noise from other sources in the cluster.
-
Adding another key to the Directory Layer (used to test if the change is indeed coming from the DL or other) is a breaking change that needs requires a new version of the DL implementation in all bindings.
-
Both the global metadata key and the “local” key used by layers will totally miss any admin operation performed by fdbcli (or outside) script (see Should mutations performed via fdbcli always update the \xff/metadataVersion key?). It can be argued that “it should not happen”, but if it happens this is catastrophic (all running instances will continue with the cache and data corruption will occur).
My feeling is that the most robust implementation for caching, would be the ability for a transaction to set read conflict ranges that can “go back in time”, ie: instead of setting a conflict range starting at the current transaction’s read-version, it could specify an older value (probably the value of the read-version when the cache was first created).
This feature does not exist, and I’m not sure if it could be done in such a way that a cache could live longer than 5 seconds anyway.
In the mean time, I had to find another solution that 1) can be retrofit onto existing layers without requiring versioning, and 2) would be robust enough to no be vulnerable to admin mistakes, like doing a clear range in fdbcli.
I ended up added a “deferred value-checks” feature in the binding itself: a “value-check” is an outstanding asynchronous read, paired with an expected result value, which will run concurrently with the execution of the transaction handler. Just before commit (for r/w transactions) or before the handler completes (for read-only transactions), the binding will wait for all these outstanding reads to complete, and verify that they all return the expected values. If any read fails or returns a different value, the attempt will be failed (by simulating a not_committed
error), and the retry loop will retry the execution.
Of course, in the next attempt, the layer code must be able to know that a value-check failed (or risk looping for ever). The binding offers another API that can be used to check if something failed in the previous attempt. The layer can then decide to drop any previous cache, and re-read everything.
This is very similar to the way FDB does optimistic locking by inducing retries by forcing code to run again. Except in this case, the layer code must notice this happening and react accordingly.
Pros:
-
Layers can check multiple keys in // without having to run them sequentially. (ex: the Directory Layer can “revalidate” all the nodes from the root in a deeply nested path in // instead of sequentially).
-
Layers can have their own set of keys that are not shared with anyone else.
-
Layers can keep a cache around when creating new entries (ex: the DL only tracks subspaces opened previously, creating a directory or deleting a directory not in the cache will not bust the cache!)
-
Layers can reuse existing keys for the value-checks (ex: the DL reuses the keys that map names to prefixes)
-
This method allows multiple layers to work on the same transaction without interference, and easily decide if their cache is bad, or it it’s someone else’s cache.
-
If the data is changed by any external tool or script, the layer will observe it. (ex: a clear-range in fdbcli will make all value checks fail, and the DL will notice it on the next attempt).
Cons:
-
Compared to the global metadata version key, this method requires at least two “hops” with the cluster, because we will still need to wait for the read-version, and then wait for the outstanding reads to complete, before calling commit. It is possible that the two hops will be merged into one if the transaction only writes.
-
The implementation of caching is more complex and error-prone because it works “accross” retries.
-
It is currently done at the binding layer, but there has some limitations. It would work better if done at a lower level.
These are the methods I have added to the .NET API:
note: the “context” class is the type that manages the lifetime of a transaction in the .NET binding. It is accessed via the tr.Context
property, which “hides” these API away from the view or “normal” users, since this is essentially an expert-level API!!!
public class FdbOperationContext
{
//...
public void AddValueCheck(string tag, Slice key, Slice expectedValue);
public void AddValueChecks(string tag, IEnumerable<KeyValuePair<Slice, Slice>> items);
public bool? ValueCheckFailedInPreviousAttempt(string tag);
public bool HasAtLeastOneFailedValueCheck { get; }
}
During the first attempt, the layer calls tr.Context.AddValueCheck(...)
or tr.Context.AddValueChecks(...)
with a list of key/value pairs, but the call is ‘void’. Each value-check is paired with a tag which is an opaque string.
In the next retry, meaning that at least on value-check failed, the layer can call tr.Context.ValueCheckFailedInPreviousAttempt(...)
with the same tag, and get a result. If the method returns true, the layer knows that the value changed at some time in the past, and throws out the cache.
Here is a somewhat contrived example of code using these checks:
async Task SomeLayerMethod(IFdbTransaction tr, ....)
{
CacheContainer? cachedMetadata = .....;
if (cachedData != null)
{ // we have to re-validate the cache!
if (tr.Context.ValueCheckFailedInPreviousAttempt("acmeLayerId"))
{ // we know from the previous attempt that something changed!
cachedMetadata = null; // drop the cache
}
else
{ // optimistically use the cached data, but add a value-check for this transcation.
tr.Context.AddValueCheck("acmeLayerId", cachedMetadata.CheckKey, cachedMetadata.CheckValue);
}
}
if (cachedMetadata == null)
{ // we must read all the metadata from the database
cachedMetadata = await GetMetadata(tr, ...); // reload metadata from the db
// keep this around _only_ if the transaction commits with success
tr.Context.OnSuccess((ctx, _) => { /* store the cached metadata somewhere */ });
}
// run the rest of the transaction as normal
tr.Set(.....);
await tr.GetAsync(....);
}
There are several performances issues, due to the fact that I’m currently handling this at the binding level:
-
I don’t dedup value checks for the same key in a transaction: I rely on the fact that the C binding already does that (merging multiple concurrent reads on the same key), but I still need to allocate tasks and futures.
-
When a value-check is created, I start the read immediately. If the same transaction changes the value after that, the value-check should not see the updated value. But if the code creates a new value-check for the same key, then that read should see the updated value. This alone means that I cannot easily “merge” checks for the same key, because it is difficult to know if the value was changed locally or not. The C binding does know that, and could easily implement this optimization!
-
Most layers have to check multiple values at once. I re-used the fake “GetValues(…)” method of the .NET binding that creates multiple futures (one for each key) but only calls back into the .NET world once (with the array of results). If this feature is more heavily used, it would be nice if the C binding could provide such bulk read method, to reduce the number of allocations and native interop.