RocksDB backend

doublemax · November 5, 2018, 1:22pm

Are there any plans to support a LSM tree like “RocksDB” as compile-time option?

yangruifeng · November 6, 2018, 7:23am

hi，here is a demo，which use rocksdb as backend。
rocksdb backend for foundationdb

ryanworl · November 6, 2018, 12:20pm

This post explains why RocksDB, and other existing storage engines, are not included as options. They do not work with the simulation testing framework as they require thread pools because they do synchronous IO. The simulation testing framework requires that every part of the database be able to run in a single thread.

That doesn’t mean you can’t or shouldn’t write an adapter for RocksDB if you personally trust it, but you would need to do the testing yourself to verify things worked as expected.

doublemax · November 6, 2018, 1:05pm

@ ryanworl

Thanks for pointing out.

davelester · November 6, 2018, 2:59pm

Any plans to contribute this back to the OSS project, or thoughts on whether you’re looking to maintain this outside of the demo?

doublemax · November 6, 2018, 7:04pm

@yangruifeng
How did you solve the async issue?
A dedicated thread?
The async-patch was never applied to RocksDB.

alexmiller · November 6, 2018, 7:47pm

@yangruifeng, is this a thing that you’d be interested in trying to see if there’s a way to adapt RocksDB to run in a more evented style and integrated into the main FDB run loop so that it can get merged into mainline FDB? For the sqlite storage engine, there’s a layer that adapts stackful coroutines into flow, CoroFlow.actor.cpp, that you might be able to use to similarly flow-ify RocksDB operations.

But like, well done though. I’m really happy to see that integrating in a new storage engine wasn’t too much work, and was feasible to do without help and guidance.

The code looks like all operations are tossed onto a rocksdb::ThreadPool.

doublemax · November 6, 2018, 7:48pm

“wiredtiger” (also LSM) would supports async ops - but I don’t know if range-deletes are possible.

yangruifeng · November 7, 2018, 2:47am

@doublemax @alexmiller
yes, a decicated threadpool(rocksdb::ThreadPool).

@alexmiller
do you mean like this?
this is my old design, but I worry that rocksdb write stalls affects main thread.

@davelester
the coding is rough, any suggestion is welcome, then i can contibute this back to the OSS project.

ricky.saltzer · November 8, 2018, 10:44pm

Would there be any benefit to also having an LMDB based backend as well?

alexmiller · November 9, 2018, 7:45am

There was a RocksDB meetup today, and Nutanix presented some very similar work that they did in dispatching RocksDB reads and writes from a pool of user-level threads / fibers. I’m hoping slides will show up soon that I can point you towards.

Your diagram is roughly a layering of what I’d expect this to look like. You’re going to be following a general pattern of getting RocksDB operations to run in a coroutine, and for any call that they would make to block, call into Flow code that returns a future as to when the coroutine can resume, and then suspend the coroutine and switch to something new. CoroFlow, and its existing usage by sqlite, should be of help in figuring out how to do that.

I think the rough concrete outline of work would be to define a new RocksDB::Env that implements:

most functions with their platform::* equivalent
a new implementation of SequentialFile and RandomAccessFile that wrap IAsyncFile
schedule() as a function that spawns a new coroutine

And then see what fails. I’d probably start by defining a Flow unit test that initializes a FlowRocksDBEnv and calls get() on a random key, and start implementing the minimal things you need to get that working.

This strongly relies upon the assumption that all blocking operations that RocksDB does somehow goes through RocksDB::Env. Anything from memory-mapped files, to sleep() calls, to mutex usage, would invalidate this assumption, and probably make integrating RocksDB into deterministic simulation infeasible.

The above defines the work that would need to be done in order to integrate RocksDB into FDB’s deterministic simulation tests. For running with RocksDB on real world clusters, I’m concerned that trying to pack the CPU load of compactions onto the same thread as what serves user read requests would be impractical. I suspect that although packing everything into one thread will be needed for testing, we’ll need to run with compactions running on threads in the background, but we can figure that out once it’s shown that we can solve the testing side of this problem.

doublemax · November 11, 2018, 6:15pm

“SQLite4” is also a LSM (with fast range-delete).
Maybe FDB should simply change to SQLite4?
https://sqlite.org/src4/doc/trunk/www/lsmusr.wiki

KrzysFR · November 11, 2018, 8:52pm

It seems that work on sqlite4’s LSM engine has stopped for some time now:

Last commit: https://sqlite.org/src4/info/c0b7f14c0976ed5e

yangruifeng · November 12, 2018, 2:09am

LSM engine has been folded into SQLite3

doublemax · November 12, 2018, 8:43am

You are right. The LSM was merged into SQLite3 (interesting).
A list of possible storage engines: github. com/pmwkaa/engine.so

The LSM trees are: SQLite3, LevelDB/RocksDB and WiredTiger

doublemax · November 15, 2018, 7:24pm

Maybe it is easier to enable the LSM tree of SQLite?
https://www.sqlite.org/cgi/src/dir?ci=9b37bbf5f338dea9&name=ext/lsm1

Topic		Replies	Views
Is the rocksdb storage engine ready for production use? Using FoundationDB	8	612	February 18, 2025
Discussion thread for new storage engine ideas Development	30	15623	February 18, 2020
Using BlobDB (key value separated variant of RocksDB) in foundationdb FoundationDB Core	0	137	March 14, 2025
Seeking to understand and fix open rocksdb storage engine issues FoundationDB Core	2	435	April 17, 2024
Some Clarification on Storage Engine and Disk/IO Using FoundationDB	12	2294	July 23, 2019

RocksDB backend

Related topics