KeySpace and FDB Directory Layer

What’s the relationship between FoundationDB’s KeySpace and Directory Layer? Is directory in KeySpace eventually implemented with directory layer?

The motivation of the question is for hybrid storage schema on same FoundationDB cluster. Wondering what would be the proper approach to use record layer within certain sub directory. That way storage could be managed by directory layer and used in different way with or without record layer.

Thanks,

Randy

To answer the motivating question first, yes, it is best practice to use the directory layer when a single cluster has multiple tenants. And, yes, simple uses of KeySpace with a string-valued root KeySpaceDirectory will do that.

To give more context to what all this is trying to accomplish: the goal is to have compact keys for record store subspaces. This may be less of an issue with upcoming storage engine improvements that do prefix compression. In which case, this is mostly of historical interest.

A KeySpace defines the logical structure of these keys. It is a template for a KeySpacePath, which binds the elements of the structure to particular values. These paths are, in turn, resolved into actual Tuple keys (or, equivalently, Subspaces). By default, this resolution uses the directory layer to convert any string components into integers, which are more compact. It does not do this the way directory layer subdirectories do, though, because that requires sequential reads per level. Rather, it does them independently and then concatenates. This also means that subkeys that are reused get the same value, so the directory layer doesn’t grow as fast.

Another difference in this approach is that the KeySpace hierarchy is actually represented in the key layout. This allows deleting or exporting any sub-hierarchy as a single subspace. As opposed to the path turning into inodes and so getting flattened, so that nothing is contiguous except within a leaf directory.

There are still a couple of problems with always using the global directory layer like this, though. First, if any keyspace directory has lots of different values (like millions or tens of millions), it can use up the small numbers (which were supposed to be reused) and all new paths start taking more space. Second, for the hierarchy export case mentioned above, some of the subkeys will have directory layer specific values. Which means they can’t be imported without translation into another cluster with its own global directory layer. To address these, it is possible to use scoped directory layers, which themselves live inside the hierarchy, and so have separate extents (uninfluenced by sibling growth) and are included in the export (making it self-contained from there down).

Hi, Mike,

Thanks for the explanation. Not clear on how the FDB Directory Layer is involved though. Is there any sample code in record layer showing the usage?

I tried saving records to KeySpace with single KeySpaceDirectory (string type) as root, but did not see any directory created in the Directory Layer. Following the code in record layer, seems it does convert KeySpacePath to SubSpace, but no Directory involved. So there is nothing when the directory is listed with python code like this:

db = fdb.open()
stores = fdb.directory.list(db)
print(stores)

Update:
Seems the key is to use DirectoryLayerDirectory as the root KeySpaceDirectory. It does create the directory for the record key space, but also create an extra directory ‘recdb_rd_cache’ on the same level. Now the question is what is this top level directory created by FoundationDB record layer?

Thanks,

Randy

Yeah, that’s right: you do have to use a DirectoryLayerDirectory to get the key space to use the directory layer. A regular KeySpaceDirectory will just insert the string directly into the key. Sorry for the confusion.

The Record Layer doesn’t use any (set) top level directory for its data. It’s entirely up to the user and determined wholly by the KeySpacePath provided. So, for example, you may choose to create a root key space directory node that looks like:

KeySpace keySpace = new KeySpace(new DirectoryLayerDirectory("root", "myApplicationName"));
KeySpacePath rootPath = keySpace.path("root");

In this case, “root” is the name of the root key space directory, and its value will always be the integer that “myApplicationName” resolves to in the FDB directory layer. If you create any record store in that path or in any subpath of that path, it will be constructed in a subspace that also begins with that integer (encoded to bytes using the FDB tuple layer). If you were to then call fdb.directory.list(db), you should find an entry for myApplicationName, and in the associated FDB subspace, you’d find all of the data that is stored under that root path.

But you could also imagine a slightly more complicated structure. Something like:

KeySpace keySpace = new KeySpace(new DirectoryLayerDirectory("root", "myApplicationName")
    .addSubdirectory(new DirectoryLayerDirectory("user"))
);
String userId = getUserId();
KeySpacePath userPath = keySpace.path("root").path("user", userId);

In this case, the userPath begins with the same integer from before, but the next element in the tuple will be the user’s ID, turned into an integer using the directory layer. So, now if you call fdb.directory.list(db), you’d find both “myApplicationName” as well as all user IDs, and if you tried to use the directory layer to open any individual ID, you’d find that it’s empty, even if you’ve created record stores with the above key space path.


One further note: the recdb_rd_cache is used to store data for the “reverse directory layer”, i.e., a data structure to turn directory layer values back into strings. This is used by the “list” methods on key spaces, which operate by reading raw keys from the database, and then translating their individual elements back into their original values. The directory layer doesn’t (efficiently) support this natively, so that’s why an additional directory was needed.

Things are a bit simpler if you use the ScopedInterningLayer instead, which has forward and reverse mappings built in. However, it uses a separate place to store data from the FDB directory layer, so, for example, if you have two applications on the same FDB cluster, one of which uses the ScopedInterningLayer to allocate a subspace and other of which uses the FDB directory layer, then they might end up writing data to the same prefix (even if they coördinate on using separate root directory names).