Questions about the Directory partitions

PierreZ · January 18, 2021, 1:24pm

Hi!

I’m currently implementing the Directory for the unofficial Rust client and I’m confused around the idea of the DirectoryPartition.

I do think something is wrong around my mental representation of the Directory, and I would love some help to find which part

On the Python bindings, we have this:

    def _contents_of_node(self, node, path, layer=None):
        prefix = self._node_subspace.unpack(node.key())[0]

        if layer == b'partition':
            return DirectoryPartition(self._path + path, prefix, self)
        else:
            return DirectorySubspace(self._path + path, prefix, self, layer)

It looks like a DirectoryPartition is kind of a nested Directory inside a node_subspace, which is confirmed by the Go’s related struct:

type directoryPartition struct {
	directoryLayer
	parentDirectoryLayer directoryLayer
}

I do not see the point of having another Directory with his own {node, hca} subspaces inside the node_subspace of the parent.

I also don’t understand the relevant documentation:

Under normal operation, a directory does not share a common key prefix with its subdirectories. As a result, related directories are not necessarily located together in key-space.

Are we talking about the content_subspace or the node_subspace? Both of them have a key-prefix:

the content_subspace has an prefix for all sub-directories,
the node_subspace can be configured at creation.

This means that you cannot use a single range query to read all contents of a directory and its descendants simultaneously, for example.

Why can’t we *just* scan the node_subspace? Here’s how I represent the directory, for a path like []string{"app", "my-app"}:

               +
               |
               | version = (1,0,0)              # Directory's version
               |
               |      +
               | "hca"|                          # used to allocate numbers like 12 and 42
               |      +
     \xFE      |
    node's     | (0,"app")=12                    # id allocated by the hca for "path"
   subspace    | (0,"app","layer")=""            # layer allow an ownership's mecanism
               |
               |
               | (0,"app",0,"my-app","layer")="" # layer allow an ownership's mecanism
               | (0,"app",0,"my-app")=42         # id allocated by the hca for "layer"
               +


               +
               |
               |
   (12,42)     |
   content     | # data's subspace for path "app","my-app"
  subspace     |
               |
               +

As we are packing any nested path in the previous subspace (here "my-app" is a Subspace packed from the “app” subspace, himself packed from the node_subspace), I don’t see why we cannot scan between

(\xFE, 0) and (\xFE, 1) for the root_node,
(\xFE, 0, ...some nested path..., 0) and (\xFE, ...some nested path..., 1) for any nested node,

and filter the “layer” row key?

For most applications this behavior is acceptable, but in some cases it may be useful to have a directory tree in your hierarchy where every directory shares a common prefix.

I guess here we are talking about the content_subspace, but we do have a common prefix: the parent, like in my exemple "my-app" is prefixed by the integer generated for "app".

Thanks for reading me, and stay safe,
PierreZ

ajbeamon · January 20, 2021, 11:34pm

This is referring to the content subspace, which is disjoint between a parent and its children. The directory paths are hierarchical as stored in the node subspace, but a parent and its children will be allocated prefixes separately and neither will be a prefix of the other.

This is actually necessarily true if we want to support cheap directory moves. We can’t move all of the data in the content subspace, but we can easily update the indirection map in the node subspace to say directory X now is a child of directory Y.

As far as I know this is a little used feature, and I don’t remember many details from the particular use-case that motivated it. The basic idea, though, is that all of the directory partitions subdirectories will be allocated within the content subspace of the partition, so now you can read all of this sub-tree, etc. by scanning one prefix.

josephg · January 21, 2021, 5:20am

I wrote the (unofficial) nodejs frontend. When I was porting the directory layer to nodejs code I puzzled over some similar ideas.

I think the idea of DirectoryPartiton is that it allows ‘physical’ separation between applications / database contents via prefixes rather than just ‘logical’ separation (what you’d get by putting each application into its own directory).

So if you have two applications a and b in your fdb database, you can completely isolate each application to a/ and b/ subspaces, without any chance of crosstalk. And that might be useful if you later wanted to move application a to a different database, or back up application b independently of a, and stuff like that.

That said, my instinct is that whoever wrote the directory layer designed it that way because they believe singletons are bad. The design makes the RootDirectoryLayer the only special object in the directory layer, with a hardcoded path. That hinge exists as a maker’s mark even if nobody ever uses it. I thought about taking that out (and generally tidying the whole directory layer up) but the directory layer fuzzer is very picky. And the other implementations of the directory layer are all almost identical. So I ended up favoring consistency and just porting the directory layer almost directly.

PierreZ · January 25, 2021, 9:03am

Thank you both for your replies!

Strange, I thought that for a path like /app/my-app, the content-subspace would been like (12, 42). I will test that.

Thanks, I will go through your implementation

The isolation is brought by the HCA, no? And in a “complex” path, isolation is brought by the HCA’s allocation of the parent? Am I mistaken?

josephg · January 25, 2021, 12:20pm

Sure thing. Feel free to reach out if you have questions, though my comments probably contain most of my remaining memory of that code.

Yes, but they share the same HCA & HCA pool. If you used separate DirectoryLayers under two raw prefixes then the code to split the data apart wouldn’t need to understand the directorylayer at all - it could copy all the keys under the prefixes blindly. But I suspect the number of applications which have ever done that could be counted on one hand. If the rust implementation was missing that feature, I doubt anyone would lose sleep over it.

Topic		Replies	Views
KeySpace and FDB Directory Layer Record Layer	3	423	October 12, 2022
Directory Layer specification FoundationDB Layers	5	1010	June 17, 2019
Directory Layer C++ Development bindings	3	530	September 1, 2021
NodeJS bindings: Announcing 1.0.0 with directory layer support! Using FoundationDB	9	1374	August 5, 2020
What's the purpose of the Directory layer? FoundationDB Layers	9	6201	July 22, 2022

Questions about the Directory partitions

Related Topics