The basic idea behind those other directory layer subspaces is to allow, in a multi-level directory hierarchy, to have independent directory layers that don’t interact, which in turn means that if one node in your key space hierarchy creates lots of children, it won’t affect the the children of its siblings (which can make the keys longer).
For example, you can imagine a key space hierarchy that’s something like (env, db_name), where the environment might be something like “prod” or “qa” and the “db_name” is an arbitrary named database. By default, if you used the record layer’s KeySpace abstraction to directory layer-ify those two components, it will add one entry to the (global, i.e., \fe
) directory layer for each database name, which in turn means that if you have lots and lots of qa databases, you can end up making the prod databases require more bytes in the resolved mapping. What the scoped directory layers do is let you define separate “prod database name” and “qa database name” directory layer instances that do not interact, and so you can create as many qa databases as you want without affecting the length of prod database keys.
That explains the “scoped” ness of \xfd
(i.e., the “extended directory layer”) and \xfc
(i.e., the “scoped interning layer”). The other thing you get from the scoped interning layer, which is built on top of the StringInterningLayer
is that the StringInterningLayer
is bidirectional, i.e., there are both mappings from string to integer and the reverse mappings from integer to string. To circle back to one of your other questions, this ends up being key when trying to move data between clusters in a multi-cluster deployment: raw data can be read from FDB, then all of the shortened data can be unshortened back into its original value, and then the data can be written to a new cluster by translating those values back using the data available on the destination cluster.
FWIW, the FDBReverseDirectoryCache
allows the same bidirectionality to be tacked onto an existing directory layer implementation. (Note that even though it’s called a cache, it’s a persistent cache and is sort of key to using the reverse lookup features of the keyspace path, so maybe cache is the wrong name.) My suggestion would be if you’re thinking of this is to start with something more like a StringInterningLayer
, but if you had existing data in FDB directory layers, you could add something like the reverse directory cache on top.
I’ll also note that due to caching, the Record Layer can’t really make use of the directory layer’s renaming features (at least not without rethinking cache invalidation). I suspect that if we’d had Redwood and prefix compression when the Record Layer was being originally developed, we’d seriously have considered just relying on prefix compression instead of all of that because that would have significantly simplified cross-cluster data movement (and, if we’re honest, single cluster writes).