I’m trying to get my head around directories and am trying to understand the design intent. My interpretation of the developer guide is that directories are intended to model the large, static ideas in an application. For example, for a forum application, you might have “users” and “topics.”
My interpretation is also that having a subdirectory for each user’s individual posts might be a bit much. For example, this might stretch the intent of the directory system:
directory:posts
directory:1944f2cd-e3c1-4f24-b031-ea5c231624de
key:post01
key:post02
directory:74e1106e-4962-4622-8369-ad8b2ee1cc53
key:post77
key:post89
…but perhaps I’m inventing obstacles where there really are none. So, broadly speaking, are per-user subdirectories in line with the design intent? How many directories is “too many?” Is there a point where having a large number of directories starts to have performance side effects?
I’ve thought about this a little more and discussed it with some colleagues, and I think I can articulate the tradeoffs with using directories in a use case like this a little more clearly. Ultimately, what I’ve learned isn’t really novel; it’s mostly just internalizing what’s already been covered in What's the purpose of the Directory layer? - #2 by alloc and applying it more specifically to our immediate use case. Still, I wanted to share the summary here in case it’s helpful to somebody else later.
For us, the main reason we’d want to use directories in a situation like this is for shorter key prefixes; we wouldn’t likely be making much (or really any) use of things like moving or renaming directories. Our use case would be “inboxes” for a relatively high-traffic messaging service. The idea of shorter prefixes for each message key is—on the surface, at least—appealing because we’d be inserting/reading a bazillion messages, and saving space/bandwidth on the keys seems like a win.
The problem for our use case is that the cardinality of the space is too high for us to really cache directories, and so we’d be querying the directory metadata for virtually every interaction with the message store. That’s almost certainly be creating a hotspot in the database, and so I think it doesn’t make sense for our use case.
I agree, I would just use a single directory for the entity type, and append a tuple to the directory subspace that represents a composite primary key with the indbox id and the message id. The redwood storage engine has prefix compression, so if the only reason for using directories is saving space, then use redwood instead for the same result with less complexity.
Btw if you are building and inbox service, check out the “Task Queue” feature in my extension’s messaging layer, it uses a combination of watches, partition allocators and versionstamps to build distributed task inbox consumers that process data fairly between inboxes (identified by the message key).