FoundationDB for embedded use

I did exactly that for the same reason (and also by necessity): embedded KV store with similar semantics to the real thing. I reused the existing client API as the base and most of the existing application code was mostly unchanged. The main difference being that it has a concurrent-reader / single writer model, built on top of a copy-on-write B-tree that is mmap’ed in memory (in .NET).

But after a while, I observed something strange: when dealing with a remote database (like fdb), all your reads are async, and all the “writes” (Set/Clear/Atomic) are non-blocking until you commit. With an embedded K/V Store that uses memory mapped files, this is exactly the reverse: reads are non-blocking (if the data file is already cached in memory) and writes are blocking (because I used single-writer so they needed to acquire a lock).

So the base API was the same but with reversed async signatures (non-blocking reads, async writes): This meant that App and Layer code needed to be changed slightly, replacing all the Get(KEY) => Future<VALUE> with Get(KEY) => VALUE.

You could probably fudge the API by having local reads pretend to be async (returning a future that completes immediately) but this would probably add a lot of overhead: allocating the async machinery is not cheap especially when doing it millions of times per second.

Then there was the final realization that if the data is local anyway (mmap’ed) then why go through an API like Get(KEY) => VALUE that will return a copy of the value that is already in memory? Why not expose a pointer to the original data itself? That opens the door to zero-copy serialization, and boosts the performance even more.

After a while, the API did not look like the original any more: most read functions would take a lambda that gets a pointer to the data, and must “process” the data while under some lock before returning the result. This worked great for thing like table scans with documents stored in some sort of Binary JSON like the format used by PostgreSQL.

  • I did a simple test that by replicating the way SQL Server stores data on disk, and was getting the same exact query speed for a non-trivial TABLE SCAN over several million rows. This was not magical: we were both stalled on the disk which was acting as the bottleneck. I changed all the strings from UTF-16 to UTF-8 and got almost double the speed instantly!

Also, I got rid of the 100K Value Size limit because why would I limit myself that way if everything is local and within a pointer’s reach? I had the B-Tree support large values and offer the guarantee that they would be contiguous in memory. This again simplified a lot the code: no need for chunking, parsers could work off a single byte* pointer, zero-copy structures could be casted from (void*) to (MyFooBar*) and back, etc…

I guess if FoundationDB would support the infamous ATOMIC_EXECUTE_LUA operation, you could get the benefit of being able to do some zero-copy stuff with your data (for reads at least), and make it look a lot more like something local (with only added latency).

So my conclusion is that having an embedded FoundationDB-like thing is really great but the existing API would feel a little bit limiting, and you’d beg for more features that would probably only make sense for local processing (zero-copy, large values, streaming parsers, …).