FoundationDB for embedded use

(Eric S.) #1

I like the layer model FoundationDB is building multi-paradigm features with. It would be interesting to use FoundationDB in an embedded app to allow the layer model without the full multi-node replication features. Is there plans to support this in the future?

I suspect it would be relatively easy to port to an embedded library. In the embedded database scenario there may be the need to open the same database in multiple processes for extensions on iOS and macOS. The SQLite B-tree and LSM1 backends support multi-process access. Does FoundationDB have any issues running in multiple processes that open the same key/value database?

Evangelism - informing about FoundationDB's new open source status where it fits into the DB landscape
(Steve Atherton) #2

It would be possible to create something that looks like an FDB embedded library without a lot of effort, though what you would end up with is basically a library which runs an in-process single instance FDB cluster and communicates with it using FDB’s intra-process communication path rather than TCP.

Such a library would basically be the FDB C library modified to include all of the server object files and to start up a server instance on C library’s Flow network thread (the thread which would normally communicate with the remote cluster).

While you would get the benefit of using any FDB layers available for your application’s language, you would only have the performance of (at best) a single instance FDB deployment. FDB’s performance advantage is in its ability to scale to multiple instances, its single instance performance is not amazing.

Database files on disk can only be used by a single FDB server process, regardless of the multi-process abilities of the underlying storage engine. The storage engine is only involved in FoundationDB’s write path at the every end. The FDB transaction processing subsystem provides FDB storage servers with a stream of ordered mutations to apply. The mutations are committed to the on-disk database in batches, with each transaction in the underlying storage engine covering 1 or more FoundationDB commit versions.

FDB Storage Servers hold 5 seconds of versioned key/value history in memory, and only mutations older than that window are pushed to disk. So while it would technically be possible to open an SQLite database file from another process simultaneously, it would have to be opened read-only and what you would see is a consistent view of the database’s state as of 5 seconds worth of versions ago.

(David Scherer) #3

I think fdbserver wants too much memory to be a good choice in an embedded context. Maybe you could tune it to a good enough place. As Steve says you will not be able to share database files between processes.

You could build an embedded implementation of the FoundationDB API, sharing little of the distributed database implementation, so as to be able to use layer code. But you should be faithful to the transaction semantics, which will take some effort depending on what the underlying storage engine gives you.

(Eric S.) #4

Thanks for the very detailed answers. I like the idea of building the layer model on a db suited for embedded use. I may look at that as a side project. Flow looks interesting and I’d like to dive in to it more.

(Christophe Chevalier) #5

I did exactly that for the same reason (and also by necessity): embedded KV store with similar semantics to the real thing. I reused the existing client API as the base and most of the existing application code was mostly unchanged. The main difference being that it has a concurrent-reader / single writer model, built on top of a copy-on-write B-tree that is mmap’ed in memory (in .NET).

But after a while, I observed something strange: when dealing with a remote database (like fdb), all your reads are async, and all the “writes” (Set/Clear/Atomic) are non-blocking until you commit. With an embedded K/V Store that uses memory mapped files, this is exactly the reverse: reads are non-blocking (if the data file is already cached in memory) and writes are blocking (because I used single-writer so they needed to acquire a lock).

So the base API was the same but with reversed async signatures (non-blocking reads, async writes): This meant that App and Layer code needed to be changed slightly, replacing all the Get(KEY) => Future<VALUE> with Get(KEY) => VALUE.

You could probably fudge the API by having local reads pretend to be async (returning a future that completes immediately) but this would probably add a lot of overhead: allocating the async machinery is not cheap especially when doing it millions of times per second.

Then there was the final realization that if the data is local anyway (mmap’ed) then why go through an API like Get(KEY) => VALUE that will return a copy of the value that is already in memory? Why not expose a pointer to the original data itself? That opens the door to zero-copy serialization, and boosts the performance even more.

After a while, the API did not look like the original any more: most read functions would take a lambda that gets a pointer to the data, and must “process” the data while under some lock before returning the result. This worked great for thing like table scans with documents stored in some sort of Binary JSON like the format used by PostgreSQL.

  • I did a simple test that by replicating the way SQL Server stores data on disk, and was getting the same exact query speed for a non-trivial TABLE SCAN over several million rows. This was not magical: we were both stalled on the disk which was acting as the bottleneck. I changed all the strings from UTF-16 to UTF-8 and got almost double the speed instantly!

Also, I got rid of the 100K Value Size limit because why would I limit myself that way if everything is local and within a pointer’s reach? I had the B-Tree support large values and offer the guarantee that they would be contiguous in memory. This again simplified a lot the code: no need for chunking, parsers could work off a single byte* pointer, zero-copy structures could be casted from (void*) to (MyFooBar*) and back, etc…

I guess if FoundationDB would support the infamous ATOMIC_EXECUTE_LUA operation, you could get the benefit of being able to do some zero-copy stuff with your data (for reads at least), and make it look a lot more like something local (with only added latency).

So my conclusion is that having an embedded FoundationDB-like thing is really great but the existing API would feel a little bit limiting, and you’d beg for more features that would probably only make sense for local processing (zero-copy, large values, streaming parsers, …).

(Eric S.) #6

Interesting. I like the idea of an embedded database inspired by FDB that uses a different API for some of the reasons you mention. SQLite often feels like the wrong tool when you don’t need an RDBMS.