How to safely add a metadata caching "layer" on top of existing layers?

So, after attempting to add a cache on top of the directory layer that can compose well with other layers (when used in the same transaction), the answer is that it is not very easy to do, and require careful care and also as a few restrictions.

  1. Reading the \xff/metadataVersion key will FAIL after it has been changed in the same transaction. An it will also doom the transaction when it attempt to commit. The only sensible way to deal with this is to prevent the read and return ‘null’ when this is the case. Ideally, this should be addressed at the binding layer with a dedicated API because it is difficult to implement correctly!

  2. When a layer A observe a ‘null’ value for the metadata version, it means that another layer B changed something somewhere previously in the current transaction. This does not mean that the cache of layer A is unusable, but layer A cannot know it and has to check again.

  3. Even if layer A detect a change and update its own cache context, it cannot use that context in the next transaction, because it can fail to commit, or another layer can interfere and prevent it from knowing what is the metadataVersion value linked to that context. The only way is to check right before a commit, and discard the newly constructed cache context. Only transactions where nobody change the metadata version can publish a new cache context for the next ones.

  4. Attempting to build a “smart” cache that will observe local schema mutation in the same transaction is very difficult to do, if the same transaction can be accessed from multiple threads. The best bet is to enforce a mutual exclusions between operations that use the cache, and operations that mutates the schema. Especially: resources obtained from the cache before calling methods that change the schema are suspect and should be read again!

  5. Each layer needs at least a local “version” key on top of the global metadataVersion key, which has to be updated everytime the schema is changed, and can then be used to quickly revalidate the cache (ideally with a single read). If the layer uses a versionStamp for that key (which is sensible), then it will fall in the same trap as in 1) and has to be extra careful to not attempt to read that key again in the same transaction (locking is required if layer code is multi-threaded!)

  6. Any layer at level N, in the stack of layers, SHOULD NOT cache any data obtained from the cache of layer at level N-1. Instead it should request the data everytime it needs it, and rely on that layer’s cache to be efficient. Ideally, any cached resource returned to the outside should have a “self destruct” option that the layer could trigger, if the previous cache context is invalidated, enforcing the rule of “don’t put the cached resource in a static somewhere!”

If these requirements are met, then it looks like it is possible to build multiple levels of caching built on top of each other, and should be efficient for transactions that do not mutate the schema of any of these layers.

So an example when combining the Directory Layer with an hypothetical Record Layer that uses directory subspaces to store the content of tables and indexes, and has complex metadata that needs to be parsed in memory to be efficient:

  • The Directory Layer has a TryOpenCached(path) API that returned a new subspace instance but that will use a cache context to store the prefixes of each subspaces. In most cases, no reads will be performed (except the initial GRV but that is inevitable).

  • The Record Layer has a GetCachedTable(tableName) method that will return a new “Table” instance where the metadata (schema, indexes, …) comes from the cache context, but this layer HAS to call the “TryOpenCached(…)” method on the DL everytime it wants to read/write keys from the tables or indexes. It MUST NOT store the subspace instance obtained in the transaction in its own cache, because it has NO WAY to know that the Directory Layer’s cache has been invalidated since.

Ideally, the cached subspace instance returned by the Directory Layer has a pointer to the original cache context, and it will always check that this context is still active before encoding or decoding keys. If the original cache context is destroyed by the DL, then that instance will be poisoned and throw errors instead.