I’m currently tackling the issue of adding cached metadata to a lot of already-written layers in our application, using the “new” metadataVersion key.
These layers predate the introduction of this feature, and most of them either completely ignored the issue (caching data like subspace’s prefix and praying that they NEVER change at runtime) or did not implement any cache and have a lot of latency that could be reduced.
My goals are:
- Have a global “API” that looks the same across all layers (in my case in C#/.NET but could apply to any other language) that deals with the caching of state shared between multiple transactions.
- Make sure that multiple layers can participate in the SAME transaction, so if they all have their own cached state, they can still be sure that they are all on the same page.
- Be sure that the global cache implementation is bullet proof so that all the rest of the code can rely on it without doing its own checks.
- Include the Directory Layer in this too, because it also has a cached state (prefix of all the subspaces, and we use them a lot)
The end result being that we could reduce the latency of transactions as much as possible.
My initial naive approach is to have some way for each layer to register a lambda “init” function that is called to create a new “state”, that would include any data read from the database, that change infrequently, and return an object that encapsulate all of this. Then all other transaction would obtain a reference to this state and use it.
By experience, I know that dealing with caches and transaction can be complex, because the cache must only be updated if the transaction commits successfully !
So for example here, from a cold start with an empty cache, and initial metadataVersion (“MV”) equal to 123.
T1 reads the metadataVersion (123), check the cache which is empty. It then invoke the “init” lambda of the layer which will read all the metadata from the database, create a new “state”. The transaction can use this state, but it cannot be published in the cache yet (only after a commit).
Once T1 commits, the cache is updated with the state at MV=123.
T2 starts later, read the MV which is still 123, find a state in the cache with the same MV, and can use the state (created by another transaction).
First questions:
- What if the transaction is read-only, but the “init” code for the layer needs to write some things (to populate missing things) ? Does this mean that the “init” code can only read metadata, and all pre-initialization should be performed elsewhere?
- What happens if multiple transactions (T1a, T1b) both starts roughly at the same time, and the cache is still empty? How do I know which state to insert in the cache? If they were both constructed at the same metadataversion, they should be identical? or maybe should I use the more recent read or commit version?
- How do I deal with long running transactions that are reset multiple times (ex: bulk reads, that read 5s, then reset and continue reading from previous cursor).
Now, some other application or unrelated code does some change into another part of the cluster, and bumps the metadataversion to MV = 456, without any change to our layer’s metadata:
T3 starts, reads the MV which is 456. This does not match the state in the cache. Before dropping the state entirely, let’s say we have our own private “version” key that could be quickly read. We find that this key has not changed, so we can still safely use the state from MV=123, even if we are now at MV=456 in the cluster. The transaction can continue with doing only a few quick reads, compared to having to re-read the entire state from the db.
Questions:
- When is it safe to update the cache ? Do I need to also wait for a successful commit, or can I immediately update the cache to say that the state is also valid for MV=456? Let’s say T4 starts right after that, but before T3 commits, can it then safely use the cache?
Next, how do I deal with multiple rapid changes to the metadataVersion, that are observed at different time by multiple concurrent transactions, and let’s say that some of the transaction lag a lot behind, and by the time they have read the new state (which is already deprecated), they attempt to update the cache while transactions started after them but that were quicker to commit, have already update the cache with the most recent state? What kind of parameters do I have to provide to the cache so that it can safely update or discard proposed state from all the callbacks of concurrent transactions?
TBD: need to draw a graph for this situation as well but it’s getting late! sorry
Questions:
- Is there a universal way to safely update the cache given the tuple (read_version, metadata_version, commit_version) of a transaction that will deal with concurrent / lagging transactions? I have a feeling that this will always be the same question for anyone wanting to do such caching.