Seeking to understand and fix open rocksdb storage engine issues

Hi.

I’m working on a project that is using FoundationDB with a workload of reasonably heavy random writes. As such, we are using the rocksdb storage engine – the default storage engine seems to have (understandably) more difficulty with this write pattern.

Other posts allude to open issues with the rocksdb storage engine. Is there any succinct description of those issues, bugs/tickets, ideally repro steps, or anything of that sort? We’d be interested in understanding and possibly fixing those issues. For context, some of us have prior experience developing other databases.

Also, what are the differences between ssd-rocksdb-v1 and ssd-sharded-rocksdb? I could make a guess from the code, but I’m curious what specifically motivated two rocksdb storage formats.

Thanks.

Did you try redwood?

I strongly suggest you try the Redwood storage engine in FDB 7.1 or later. Snowflake uses it across its entire fleet since last year and has been using it since 2022 in production. It has no known corruption bugs and no wrong-results or data corruptions have ever been observed. Compared to the default engine, Redwood uses less CPU and less IO while providing lower read latency, higher read and write throughput, and faster startup.

For random writes in particular, one of the issues you would hit with the default engine is that it applies version-ordered updates serially which incurs serial disk read latency when uncached records are updated. Redwood does not have this issue because it batches commits over a version range and does a shared traversal to all update locations in parallel, minimizing tree node visits and parallelizing disk IO.

Redwood is also very good at range reads and writes. Even if your workload does not do these, FDB data movement does use them to move/copy shard replicas around within the cluster so performance is critical to fast healing times and mitigating load or space hotspots. The disk write amplification for data movement with Redwood is ~1 and accumulates 0 compaction debt as Redwood is not an LSM.

I apologize that there isn’t more documentation yet on Redwood, it’s been on my want-to-do list for a long time but unfortunately I’ve been occupied with other projects and life things.

Operationally, the most important thing to know about Redwood is that it does not shrink its data file, freed space is kept internally and reused when needed, so when looking at free space the metric to use in cluster status JSON or in the trace log StorageMetrics events is the “available” bytes metric, which is the sum of free space on the file system and free reusable blocks within the .redwood-v1 data file.