Redwood engine and Directory layer

The Tokio Rust bindings that I am currently working on is passing binding tester tests in a stable way.

As I started investigating adding directory layer to the bindings, I realized that there is quite a bit of code both in FoundationDB and in RecordLayer, just to circumvent the lack of prefix compression feature.

I am currently having second thoughts around adding directory layer to the bindings. Adding the code is easy part. Its the maintenance and support that I am really weary about. So, if I don’t have to provide a directory layer, I would prefer that.

Could I please get feedback on the following.

  1. Is supporting Directory Layer something that is really needed for FDB 7.0 and beyond with Redwood Engine? If so, what aspects of Directory Layer API and Record Layer’s KeySpace API might users find useful.

  2. How can I try Redwood storage engine with prefix compression today? Does it require building 7.0 from scratch? I tried looking for documentation for this but could not find it.

There is still a fair bit of work ahead of me before I can get the Rust bindings to a state that’s going to be useful for normal Rust users. There is a lot documentation work that is still pending. In addition, I think, we’ll also need a minimal Record Layer like features - schema management, secondary indexes, a simple query execution engine, which can only be started after the binding layer is done.

So, if I can avoid having to write the directory layer, I would really prefer it! :slight_smile:

How is that related to prefix compression?

From the current release-7.0 branch, Redwood will be a Testing Storage Server (TSS). There’s some documentation here. This will allow you to run Redwood next to storage Servers. I’m not sure about Redwood as a main storage engine for release-7.0.

I tested Redwood by building the release-7.0 branch. I added some documentation on how to build FDB on the dedicated repo.

On the side note, I’ve build some Github Actions workflows to build release branches every sunday. You might find it interesting.

The directory layer also provides a level of indirection in addition to short, unique prefixes. E.g. you can rename a directory without moving any of its contents.

1 Like

Thanks @andrew.noyes @PierreZ for the reply and the pointers.

The DirectoryLayer allows you the associate a path to an Subspace. The generated subspace will be an int64, generated by the HCA. A nice explanation about HCA is available here.

This means that, rather that using a long Tuple as a key-prefix for your app, like ("application", "my-app", "env", "preprod", "tenant", "my-tenant", "tenant-1"), you will have a single int64 to represent this path, creating a sort-of “prefix compression”, at the cost of reading/maintaining the \xFE subspace, which holds the DirectoryLayer’s metadata.

You are absolutely right, this is what convince me to use the DirectoryLayer :blush: Back in my example, moving
("application", "my-app", "env", "preprod", "tenant", "my-tenant", "tenant-1")
to
("application", "my-app", "env", "prod", "preprod", "my-tenant", "tenant-42")

will only be an administrative operations, i.e changing some keys in the DirectoryLayer’s metadata, without rewriting the keys which are holding the data.

In addition to \xFE subspace that the FDB directory layer uses, Record Layer also uses subspaces \xFD and \xFC.

When using the Redwood storage engine the main benefit of the Directory Layer becomes the ability to move/rename directories and having smaller keys in network messages (though some of these may eventually use prefix compression).

The main benefit of not using the Directory Layer is avoiding additional reads to resolve directories to their shortened unique prefixes. It is also worth pointing out that without the Directory Layer it becomes easier to move/copy part of the keyspace from one FDB cluster to another, which can be useful for multi-cluster deployments. With the Directory Layer such a move/copy must involve reconciling the likely-different directory → prefix mappings of the two clusters and possibly translating the copied keys to match the destination cluster.

All releases of FDB 7.0 will have Redwood available for use as a storage engine with the name ssd-redwood-1-experimental. This means you can configure it as your cluster’s main storage engine (but note the experimental label!) or you can use the new TSS feature to test Redwood against your workload with zero risk to durability, availability, or performance of your FDB cluster.

While Redwood is still marked experimental, in practice it is very stable and performs very well. There are no known bugs, and the code has passed many millions of correctness tests. It is still marked experimental mainly because it is not being used in production anywhere yet, and there are some minor but not-backward-compatible changes being made in it for FDB 7.1.

Funny, I wasn’t aware of that Subspaces, they seems to not be used in the classic bindings. :thinking: I was always curious why the default Directory Layer is using an empty subspace as the content subspace, maybe someone from the RL team could also explain the idea behind subspaces \xFD and \xFC ?

You are confirming my worries about multi-cluster deployments and the DirectoryLayer. I guess this is something carefully designed within CloudKit’s control-plane to move customers around. Is there some thoughts about this somewhere on the forum?

Awesome, thanks for the clarification :+1:

The basic idea behind those other directory layer subspaces is to allow, in a multi-level directory hierarchy, to have independent directory layers that don’t interact, which in turn means that if one node in your key space hierarchy creates lots of children, it won’t affect the the children of its siblings (which can make the keys longer).

For example, you can imagine a key space hierarchy that’s something like (env, db_name), where the environment might be something like “prod” or “qa” and the “db_name” is an arbitrary named database. By default, if you used the record layer’s KeySpace abstraction to directory layer-ify those two components, it will add one entry to the (global, i.e., \fe) directory layer for each database name, which in turn means that if you have lots and lots of qa databases, you can end up making the prod databases require more bytes in the resolved mapping. What the scoped directory layers do is let you define separate “prod database name” and “qa database name” directory layer instances that do not interact, and so you can create as many qa databases as you want without affecting the length of prod database keys.

That explains the “scoped” ness of \xfd (i.e., the “extended directory layer”) and \xfc (i.e., the “scoped interning layer”). The other thing you get from the scoped interning layer, which is built on top of the StringInterningLayer is that the StringInterningLayer is bidirectional, i.e., there are both mappings from string to integer and the reverse mappings from integer to string. To circle back to one of your other questions, this ends up being key when trying to move data between clusters in a multi-cluster deployment: raw data can be read from FDB, then all of the shortened data can be unshortened back into its original value, and then the data can be written to a new cluster by translating those values back using the data available on the destination cluster.

FWIW, the FDBReverseDirectoryCache allows the same bidirectionality to be tacked onto an existing directory layer implementation. (Note that even though it’s called a cache, it’s a persistent cache and is sort of key to using the reverse lookup features of the keyspace path, so maybe cache is the wrong name.) My suggestion would be if you’re thinking of this is to start with something more like a StringInterningLayer, but if you had existing data in FDB directory layers, you could add something like the reverse directory cache on top.

I’ll also note that due to caching, the Record Layer can’t really make use of the directory layer’s renaming features (at least not without rethinking cache invalidation). I suspect that if we’d had Redwood and prefix compression when the Record Layer was being originally developed, we’d seriously have considered just relying on prefix compression instead of all of that because that would have significantly simplified cross-cluster data movement (and, if we’re honest, single cluster writes).

1 Like

Thanks for the complete explanation @alloc :pray:

Thanks @alloc and @SteavedHams for the reply.

It is also worth pointing out that without the Directory Layer it becomes easier to move/copy part of the keyspace from one FDB cluster to another, which can be useful for multi-cluster deployments. With the Directory Layer such a move/copy must involve reconciling the likely-different directory → prefix mappings of the two clusters and possibly translating the copied keys to match the destination cluster.

The main problem here is that using a directory layer per cluster does not support generating globally unique prefixes across your clusters. If you are converting some globally unique data into a prefix with the directory layer, then you lose your uniqueness and could have collisions with other clusters.

You might be able to work around this by using a unique content subspace per cluster, which would make each allocated prefix unique again when combined with the cluster prefix. Directories moved to another cluster could then be registered using the APIs that support specifying raw prefixes and wouldn’t collide with any directories there. I haven’t actually tried this, so I’m not certain if there are any gotchas I’m missing.

I was under the impression that redwood engine would bring “automatic” prefix compression, hence making the directory layer not useful for the purpose of having “small prefixes”.

That’s right. If you do not need any other capabilities offered by the Directory Layer other than shortening stored keys then Redwood will transparently do this for you.

2 Likes

The Directory Layer is useful to allocate subspaces dynamically at runtime for e.g. new users, while keeping related keys collocated to speed up seeks. Instead of the following subspaces:

(profile 
  (user_x ...)
  (user_y ...)
  (user_z ...))
(todo
  (user_x ...)
  (user_y ...)
  (user_z ...))
(mail ...)
(app_3 ...)
...
(app_n ...)

One can have the following schema:

(user_x
  (profile ...)
  (todo ...)
  (mail ...)
  (app_3 ...)
  ...
  (app_n ...))

(user_y
  (profile ...)
  (todo ...)
  (mail ...)
  (app_3 ...)
  ...
  (app_n ...))

(user_z
  (profile ...)
  (todo ...)
  (mail ...)
  (app_3 ...)
  ...
  (app_n ...))

Because when an user is online, they will access related keys, there is a greater chance those keys are already in cache if they are collocated.