Abstracting hundred-thousands of collections of items

janderland · July 29, 2019, 7:52pm

I want to create an FDB instance holding holding several hundred-thousand collections of items. Each of these collections have a name, and need to be re-namable. I’d appreciate the community’s opinion on this.

1. Separate Name Index

First idea is to use a separate index for the name. The main data entries would look like this:

(app_dir, collections_dir, 1, item1) = data1
(app_dir, collections_dir, 1, item2) = data2
...

Then the index would look like this:

(app_dir, names_dir, collection_name) = 1

2. Collection Directories

The second idea is to use directories for each collection which are inherently re-namable:

(app_dir, collections_dir, collection1_dir, item1) = data1
(app_dir, collections_dir, collection1_dir, item2) = data2
...

I’m leaning towards the first plan because I don’t know of a way to stream a list of subdirectory names. One of the things I need to be able to do is list all collection names. Because several hundred thousand names may fail to be read in a single transaction, I wanted the ability to perform a long multi-transaction range-read on the contents of the collections_dir. The directory API doesn’t currently provide this, correct?

ryanworl · July 29, 2019, 8:17pm

If you choose to go the do-it-yourself route with creating an index of names to internal IDs, you should check out the metadata version feature. This will allow you to keep a consistent cache on the client of names to IDs and know when 1) collections are added or removed, and 2) when collections are re-named.

This will decrease the latency of your transactions if each requires mapping a collection name to ID before doing anything else.

The directory layer currently doesn’t implement this, so that is a benefit to doing it yourself here.

alexmiller · July 29, 2019, 9:23pm

Without looking at your proposed solutions, my thought would be to use a random ID / high contention allocator to allocate IDs, and then maintain a separate index that maps ID to name. This appears to be your (1) solution.

I vaguely recall some weird caveats about directories being renamed, but it’s left my mental cache… @alloc would probably know.

Topic		Replies	Views
Object store on FoundationDB FoundationDB Layers	9	4981	May 18, 2018
Create new index at runtime: How will the indexes for old objects be created? Using FoundationDB	2	313	July 13, 2021
Shard marker for log-like data structures FoundationDB Core performance	15	2270	July 13, 2020
Request for feedback: Full export, Go binding usage, Future plans Using FoundationDB	2	443	March 29, 2021
Secondary indexing approaches Using FoundationDB	6	2854	October 20, 2018

Abstracting hundred-thousands of collections of items

1. Separate Name Index

2. Collection Directories

Related topics