What's the purpose of the Directory layer?

alloc · September 3, 2018, 6:59pm

As suggested, the directory layer does provide something that resembles a file-system’s hierarchical directory structure on top of FoundationDB. I think, though, that instead of files being key-value pairs, files are more like “subspaces”. In other words, each logical directory path (e.g., “my-application/my-collection/my-table” in directory notation or (“my-application”, “my-collection”, “my-table”) in tuple notation) is translated into a physical byte string (e.g., \x16\x04\x03). The directory layer guarantees that no other directory layer path maps to that same physical string, so if you wanted to have multiple logical “tables” or “collections” (or whatever makes sense in your data model), you can assign each one a directory path and get a subspace in which you can put all of the data for each such table or collection (or whatever).

But, in theory, you could just use the subspace layer directly and get the same level of data isolation. For example, you could just create a subspace whose prefix tuple is the same as the directory layer path you were giving to the directory layer, and everything would still work. The differences are:

The directory layer generates much shorter prefixes. In particular, it will assign each path a tuple-encoded integer, which (because the tuple layer uses a variable length encoding) means that if you have fewer than 65,536 directories, you will only need 3 bytes per prefix at most. The subspace layer, by contrast, if given strings will produce a prefix which is (slightly) larger than the string itself.
The directory layer supports renaming prefixes without moving any of the underlying data. In particular, directory layer “moves” manipulate the logical-to-physical mapping data structures, but not the data themselves.
The directory layer can be queried. So, for example, because it’s possible to know if someone else has already used the same path, you can check. (The list method you mentioned can be useful for this, and also useful for recursively searching down different directory paths.)
But, the directory layer requires an extra database read. In particular, before you can get the prefix, you have to ask the database what it is. You can cache this value…but then you can’t (safely) make use of the move or remove features of the directory layer unless you have a good solution to the cache invalidation problem.
There is no relationship between the physical locations of different subdirectories’ subspaces. This can be both good or bad. On the good side, it means that (1) your prefixes can be shorter and (2) you can move subdirectories around without moving data. On the bad side, it means that you can’t do something like issue a single range delete to move all of the data in a directory and its subdirectories. You also can’t do a single range scan and get all of the data if you are interested in copying the data from one cluster to another.

As to whether the directory layer is optimized for dynamic directory creation, I’d say it is. It doesn’t use versionstamps for high throughput, but it uses an internal class called the “high contention allocator” to generate short prefixes in a high contention way (thought it’s not zero contention). Another member of the community actually wrote up a description of how the high-contention allocator works: https://www.activesphere.com/blog/2018/08/05/high-contention-allocator

That being said, you can also use the directory layer if you have only a few directories or if your directories are fairly static. Because of the extra DB reads, you have to be a little careful, especially because if you query the directory layer too frequently (because there is typically only one per cluster, though you can create additional ones), you can create “hot keys” in the directory layer’s subspace (by default, the \xfe subspace). So, if you are primarily using the directory layer as a way of producing short prefixes from long paths (which is a reasonable enough use case), my suggestion would be to heavily cache the directory layer results (and just accept that you will never be able to “remove” or “move” the path, which is fine for many use cases).

I guess the mistake that those two layers are trying to help you avoid is doing something like creating a data model where you end up using the entire cluster, then you need to put some more meta-data somewhere else, and then you have no-where to put it, because you’ve used the entire keyspace for the first part of your application. For example, you could imagine a simple data model where keys are some primary key and values are some say named tuple (serialized using your favorite named-tuple serializer):

(key1) -> {a: 10, b: "val"}
(key2) -> {a: 66: c: "var"}

But then you decide you want an index on the “a” field of every named-tuple, something like:

10 -> (key1)
66 -> (key2)

But where does it go? If integers are valid keys, then it’s possible you will have keys that intersect with your index…and that’s no good. But if you used subspaces, you might do something like:

("primary", key1) -> {a: 10, b: "val"}
("primary", key2) -> {a: 66: c: "var"}
("secondary", "a_index", 10) -> (key1)
("secondary", "a_index", 66) -> (key2)

I guess I’d also note that even if it’s not necessarily a “mistake” people make, the subspace layer (assisted by the directory layer) is required to implement multi-tenancy in any sane way. If you have multiple users/applications sharing the same FoundationDB cluster, subspaces and directories are the easiest way to achieve data isolation (with each user getting their own directory/subspace).

When is it a bad idea to use the directory layer?

Well, as mentioned above, if you only have a few, relatively static directories, it’s usually a bad idea to naïvely use the directory layer because you might run into hot keys in an instance where the results can be easily cached. (You also save yourself a round trip or two.) I suppose you can also run into hot keys (or hot ranges) in general with the directory layer. For that reason, you might need to maintain multiple directory layers and load balance between them if you can’t cache.

Also, even though the directory layer is designed to handle multiple incoming requests fairly well, if you create directories too frequently, you can run into issues stemming from the fact that (1) as you create more directories, the prefix must necessarily get larger (at a logarithmic rate) and (2) as you create more directories, you must store meta-data about the fact that the directory exists. So, for example, the “worst case” might be something like every time you store a new key, you create a new directory just for it. Then because of book-keeping information, you are more than doubling the number of keys and number of key bytes used to store this single key. Yuck.

I guess I’d propose this rule of thumb: the directory layer should be used to create keyspaces to distinguish different uses of the FoundationDB cluster, while the subspace layer should be used directly to separate concerns within one single usage.

I’m not sure that that rule makes any sense as I type it out, but let’s take the example of a simple data model from primary key to named tuple. In this instance, I’d probably say that keeping track of the “primary key” and index information is a use within a single usage, so one should probably use the subspace layer directly (probably choosing short names for the primary keyspace and secondary indexes subspaces–perhaps the integers 1 and 2–maybe reserving 0 for meta-data?). But let’s say you wanted to run two copies of this simple system on the same cluster. Then maybe you’d give each collection its own directory prefix. Something like:

dir: "/use1", 1, key1 -> { a: 10, b: "val"}
dir: "/use1", 1, key2 -> { a: 66: c: "var"}
dir: "/use1", 2, "a", 10 -> key1
dir: "/use1", 2, "a", 66 -> key2

dir: "/use2", 1, key3 -> { a: 15 }
dir: "/use2", 1, key4 -> { a: 14, d: false }
dir: "/use2", 2, "a", 14 -> key4
dir: "/use2", 2, "a", 15 -> key3

(Another thing you could so is for each usage, you keep track of its own directory layer that has a mapping of, say, index or table names to short prefixes that you use within that copy of your system, which avoids one use case spilling into another and, say, filling up its directory layer. This also helps balance out the load rather than sending everything to the one-and-only directory layer and creating hot keys.)

Hopefully this is at all cogent. If not, feel free to ask follow up questions.