A new tool for managing layer metadata

I am moving discussion from Added a metadata version key by etschannen · Pull Request #1213 · apple/foundationdb · GitHub over to the forums.

A common pattern in layers is that there is a rarely changing schema that needs to be validated within each transaction. These reads of a small amount of data can be a burden on clusters, because the reads are concentrated on a very small section of key space.

The PR adds a key to the system keyspace which has its value sent to the client with every read version. This means that clients can see the value of this key without communicating with storage servers.

Q1:

If the read version is not known yet, it will block until the read version comes back from the proxy. If you already have a read version, it is guaranteed to be synchronous.

Q2:

This should not be too big of a concern. Each change will cause every client to invalidate their cache, so if you have 1000 clients, changing it will cause 1000 reads to the location where metadata is stored.

Q3:

This is a plausible use case. @ajbeamon might know more about how easy it would be to implement.

2 Likes

In the GitHub PR, I didn’t quite get how this metadata key should be used to mitigate the hot key problem.

Can anyone (@alloc @Evan ) give a simple example to explain how this version key should be used?

The shortest example I can think of is if you have a document store that can add and drop indexes at runtime.

You need to maintain a list of which indexes are active, and one way is to read a key during every transaction that holds the list. That way any time an index is added or removed you will see it right away.

If every transaction has to read a single key, you will eventually overload the storage servers with that key. It also taxes every transaction with some small amount of latency (although you could probably hide it by optimistically assuming the schema is valid and checking it yourself sometime during the transaction later). Note that you still need to manage the backfilling and deleting of indexes in the background, you’ll just observe the state transitions thereof on all clients immediately.

The alternative is to somehow cache the schema. The goal is to operate the cache such that any client only has either the current version or the previous version of the schema. If you can maintain that, you can use the online schema change protocol from F1. This is possible in FDB but takes some amount of code every layer that wants online schema changes has to write. A bug in this code is almost guaranteed to create inconsistencies in the data, such as index entries that point to nothing.

This change sends a third type of version (other than read and commit), the “‘metadata version” of a transaction, back to the client when they begin a transaction. This lets clients cache the schema and invalidate it based on the metadata version being different than what they’ve got cached.

When a client detects a change, it can read the actual schema key and continue serving requests. This means only the transactions at that time need to read the actual schema key.

3 Likes

Thank you very much for the example!

Now I got the idea. As a summary:

To solve the hot keys, we can cache the keys in the layers (e.g., RecLayer or DocLayer).
To keep the cached data consistent with the data in DB (storage servers), we need to invalidate the cache whenever the data is changed.
The metadata key can be used as a mechanism to notify the layers/clients that their cached data may have changed, and they need to invalidate their cached data.

Will we also be able to set a watch on this key and be notified when it changes? (for monitoring tool, etc…)

@Evan considering this key is in system key space, do we need to set the option ACCESS_SYSTEM_KEYS, on the transactions reading or writing this key? I hope that’s not the case at least for read transactions, as that could be every transaction.

The key has been given special treatment in NativeAPI and ReadYourWrites (RyW being the relevant one here I think?)

As Ryan figured out, you do not need to call ACCESS_SYSTEM_KEYS to read or modify this key.

I did not add support for watching this key efficiently, meaning if you watch the key, the watch will go to the database like any other watch. Is there an actual use case for using the watch API? In some sense the whole point of the key is to watch for changes in a transactional manor, and the watch API creates a future which is separated from the transaction that created it.

This would be for monitoring tools or data visualization tools that could be notified when the “schema” is changed and automatically refresh the page. Currently, you need to actively read the key to notice that it changed, which would require a monitoring tool to do polling.

I don’t think that this would be a problem if the watch would go to the database for this particular use case.

This looks like a cool enhancement. I’m wondering if there’s been any further discussion about Q3 (the use of metadataVersion by the DirectoryLayer). It feels like a very natural integration, and one that I think we could use to good effect to model databases in Apache CouchDB. /cc @ajbeamon

I don’t think it would be too hard to add support for it, though I’m not sure yet exactly what the API for it should look like. I created an issue in GitHub to track it: