A new tool for managing layer metadata

Evan · March 2, 2019, 2:24am

I am moving discussion from Added a metadata version key by etschannen · Pull Request #1213 · apple/foundationdb · GitHub over to the forums.

A common pattern in layers is that there is a rarely changing schema that needs to be validated within each transaction. These reads of a small amount of data can be a burden on clusters, because the reads are concentrated on a very small section of key space.

The PR adds a key to the system keyspace which has its value sent to the client with every read version. This means that clients can see the value of this key without communicating with storage servers.

Q1:

If the read version is not known yet, it will block until the read version comes back from the proxy. If you already have a read version, it is guaranteed to be synchronous.

Q2:

This should not be too big of a concern. Each change will cause every client to invalidate their cache, so if you have 1000 clients, changing it will cause 1000 reads to the location where metadata is stored.

Q3:

This is a plausible use case. @ajbeamon might know more about how easy it would be to implement.

mengxu · March 2, 2019, 6:55am

In the GitHub PR, I didn’t quite get how this metadata key should be used to mitigate the hot key problem.

Can anyone (@alloc @Evan ) give a simple example to explain how this version key should be used?

ryanworl · March 2, 2019, 1:21pm

The shortest example I can think of is if you have a document store that can add and drop indexes at runtime.

You need to maintain a list of which indexes are active, and one way is to read a key during every transaction that holds the list. That way any time an index is added or removed you will see it right away.

If every transaction has to read a single key, you will eventually overload the storage servers with that key. It also taxes every transaction with some small amount of latency (although you could probably hide it by optimistically assuming the schema is valid and checking it yourself sometime during the transaction later). Note that you still need to manage the backfilling and deleting of indexes in the background, you’ll just observe the state transitions thereof on all clients immediately.

The alternative is to somehow cache the schema. The goal is to operate the cache such that any client only has either the current version or the previous version of the schema. If you can maintain that, you can use the online schema change protocol from F1. This is possible in FDB but takes some amount of code every layer that wants online schema changes has to write. A bug in this code is almost guaranteed to create inconsistencies in the data, such as index entries that point to nothing.

This change sends a third type of version (other than read and commit), the “‘metadata version” of a transaction, back to the client when they begin a transaction. This lets clients cache the schema and invalidate it based on the metadata version being different than what they’ve got cached.

When a client detects a change, it can read the actual schema key and continue serving requests. This means only the transactions at that time need to read the actual schema key.

mengxu · March 3, 2019, 4:48am

Thank you very much for the example!

Now I got the idea. As a summary:

To solve the hot keys, we can cache the keys in the layers (e.g., RecLayer or DocLayer).
To keep the cached data consistent with the data in DB (storage servers), we need to invalidate the cache whenever the data is changed.
The metadata key can be used as a mechanism to notify the layers/clients that their cached data may have changed, and they need to invalidate their cached data.

KrzysFR · March 3, 2019, 4:24pm

Will we also be able to set a watch on this key and be notified when it changes? (for monitoring tool, etc…)

mbhaskar · March 4, 2019, 3:54pm

@Evan considering this key is in system key space, do we need to set the option ACCESS_SYSTEM_KEYS, on the transactions reading or writing this key? I hope that’s not the case at least for read transactions, as that could be every transaction.

ryanworl · March 4, 2019, 4:02pm

github.com

apple/foundationdb/blob/075fdef31a49d06d0e39cfa2714f33b1cfcac71e/fdbclient/ReadYourWrites.actor.cpp#L1225


		return Optional<Value>();
	}


	if(checkUsedDuringCommit()) {
		return used_during_commit();
	}


	if( resetPromise.isSet() )
		return resetPromise.getFuture().getError();
	
	if(key >= getMaxReadKey() && (!tr.apiVersionAtLeast(610) || key != metadataVersionKey))
		return key_outside_legal_range();


	//There are no keys in the database with size greater than KEY_SIZE_LIMIT
	if(key.size() > (key.startsWith(systemKeys.begin) ? CLIENT_KNOBS->SYSTEM_KEY_SIZE_LIMIT : CLIENT_KNOBS->KEY_SIZE_LIMIT))
		return Optional<Value>();


	Future< Optional<Value> > result = RYWImpl::readWithConflictRange( this, RYWImpl::GetValueReq(key), snapshot );
	reading.add( success( result ) );
	return result;
}

The key has been given special treatment in NativeAPI and ReadYourWrites (RyW being the relevant one here I think?)

Evan · March 5, 2019, 5:24am

As Ryan figured out, you do not need to call ACCESS_SYSTEM_KEYS to read or modify this key.

I did not add support for watching this key efficiently, meaning if you watch the key, the watch will go to the database like any other watch. Is there an actual use case for using the watch API? In some sense the whole point of the key is to watch for changes in a transactional manor, and the watch API creates a future which is separated from the transaction that created it.

KrzysFR · March 5, 2019, 8:37am

This would be for monitoring tools or data visualization tools that could be notified when the “schema” is changed and automatically refresh the page. Currently, you need to actively read the key to notice that it changed, which would require a monitoring tool to do polling.

I don’t think that this would be a problem if the watch would go to the database for this particular use case.

kocolosk · April 3, 2019, 8:09pm

This looks like a cool enhancement. I’m wondering if there’s been any further discussion about Q3 (the use of metadataVersion by the DirectoryLayer). It feels like a very natural integration, and one that I think we could use to good effect to model databases in Apache CouchDB. /cc @ajbeamon

ajbeamon · April 3, 2019, 11:30pm

I don’t think it would be too hard to add support for it, though I’m not sure yet exactly what the API for it should look like. I created an issue in GitHub to track it:

Topic		Replies	Views
Sharing the metadataVersionKey for multiple tenants FoundationDB Layers	3	1190	June 25, 2023
Cannot commit transaction that reads the metadataVersion key after changing it Using FoundationDB	6	1292	December 13, 2019
Should mutations performed via fdbcli always update the \xff/metadataVersion key? Using FoundationDB	5	932	May 14, 2020
How to safely add a metadata caching "layer" on top of existing layers? Using FoundationDB	1	1301	December 12, 2019
Deferred value checks as an alternative to the metadataVersion key for caching in layers Development	2	848	May 26, 2020

A new tool for managing layer metadata

Related topics