Redwood Storage Engine documentation for 7.1 is missing

Here is the URL:

https://apple.github.io/foundationdb/redwood.html

Is there information somewhere else besides the talks from 2019?

Not yet, I’ll be updating that page soon, hopefully this week.

Here’s a status update: Redwood in FDB 7.1 (both release-7.1 and snowflake/release-7.1) is the first production-ready version and its on-disk format will be supported across upgrades going forward. It was still marked experimental out of caution since it hadn’t been used in production yet when the release was cut.

Since then, Snowflake has tested Redwood on many production clusters using the Testing Storage Server mode and as of now more than a dozen of those clusters are using Redwood as their primary storage engine.

4 Likes

@SteavedHams So now for fdb 7.1.25, the default storage engine is still SQLite, right? And if we want to enable redwood storage engine, is there any switch or parameter to active it? And when will the detailed document about redwood storage engine be ready? Thanks!

@SteavedHams So now for fdb 7.1.25, the default storage engine is still SQLite, right? And if we want to enable redwood storage engine, is there any switch or parameter to active it? And when will the detailed document about redwood storage engine be ready? Thanks!

I believe you’re using the operator? If so you just have to add the storage engine string to thee supported storage engines (here: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/22c93ecdac5b880044cf4f8a50c285e59677715e/api/v1beta2/foundationdb_database_configuration.go#L725-L740) in the current version the operator does not support setting the redwood storage engine.

1 Like

@johscheuer Thanks for your suggestion. So I wonder what storage engine string should be added here?
I just found the following code:
…
const (
// StorageEngineSSD defines the storage engine ssd.
StorageEngineSSD StorageEngine = “ssd”
// StorageEngineSSD2 defines the storage engine ssd-2.
StorageEngineSSD2 StorageEngine = “ssd-2”
// StorageEngineMemory defines the storage engine memory.
StorageEngineMemory StorageEngine = “memory”
// StorageEngineMemory2 defines the storage engine memory-2.
StorageEngineMemory2 StorageEngine = “memory-2”
// StorageEngineRocksDbExperimental defines the storage engine ssd-rocksdb-experimental.
StorageEngineRocksDbExperimental StorageEngine = “ssd-rocksdb-experimental”
// StorageEngineRocksDbV1 defines the storage engine ssd-rocksdb-v1.
StorageEngineRocksDbV1 StorageEngine = “ssd-rocksdb-v1”
// StorageEngineShardedRocksDB defines the storage engine ssd-sharded-rocksdb.
StorageEngineShardedRocksDB StorageEngine = “ssd-sharded-rocksdb”
)
…
What redwood engine string like? Would you provide an example?
And after adding the storage engine string we need also add some logic to handle it, right? Is there any example?

Now we have tested fdb7.1(default storage engine) against fdb6.2, and found when the data size is 100000 rows, 7.1 is better than 6.2, but when the data size is 500000 rows, 6.2 is much better than 7.1, is it as designed? You can check the details here (Is there any performance report for fdb7.1 vs fdb6.2? - #2 by liyyue) , thanks!

how about this:
// StorageEngineRedwood1Experimental defines the storage engine ssd-redwood-1-experimental.
StorageEngineRedwood1Experimental StorageEngine = “ssd-redwood-1-experimental”

I see in operator 1.4.1 that the foundationdbclusters.yaml has already introduced redwood, except type_StorageEngine_string didn’t:
storage_engine:
default: ssd-2
enum:
- ssd
- ssd-1
- ssd-2
- memory
- memory-1
- memory-2
- ssd-redwood-1-experimental
- ssd-rocksdb-experimental
- memory-radixtree-beta
- custom

Redwood’s storage engine name is is ssd-redwood-1-experimental in FDB 7.1 and 7.2. Despite the “experimental” label, this version of Redwood is production-ready and Snowflake has been using it in production for about 6 months.

We will be dropping the -experimental suffix soon on the main branch, and starting with FDB 7.3 the name will just be ssd-redwood-1 but for backward compatibility the ssd-redwood-1-experimental name will still be supported as an alias.

1 Like

@SteavedHams thanks for your suggestion! I wonder if there is any performance report for the redwood storage engine against sqlite engine? How much performance improvement we can get after the storage engine changed?

It really depends on your workload, it could be anywhere from 25% to 1300% more throughput.

The worst case would be very small KV pairs with reads and writes of random keys that are not adjacent/nearby and do not repeat close in time. A lot of differences between Redwood and SQLite do not apply for this workload, however Redwood still has more CPU-efficient reads and page modifications and more cache miss parallelism on the write path so you will still see a gain.

You will see a much larger gain if you are using larger KV pairs or have more key locality. Range reads will have lower latency due to prefetching. Data movement on a non-idle cluster will be up to 10x faster (on a fully idle cluster data movement is slow regardless of storage engine due to a bug to be fixed in 71.3).

Throughput aside, in heavy workloads, Redwood is much more stable due to IO operation prioritization and write path IO parallelism and spreading out writes over time. Storage Servers running Redwood are far less likely to fall behind in applying their mutation stream due to reads vs the sqlite engine, and less likely to have read latency spikes due to write flush bursts.

1 Like

@SteavedHams Thanks for your information! I will do more test to confirm it.

Seems like the operator actually supports setting the redwood storage engine (this is the line that adds the checks for the storage engine string: fdb-kubernetes-operator/foundationdb_database_configuration.go at 22c93ecdac5b880044cf4f8a50c285e59677715e · FoundationDB/fdb-kubernetes-operator · GitHub). I never build a cluster with redwood, so I was not sure/aware that the operator already supports it.

1 Like

@SteavedHams do you have an update on when the documentation page will be ready?

Thank you!

I’ve been doing some measurements internal to our application to quantify the storage overhead between MySQL InnoDB and redwood for (roughly) equivalent data and so far I’ve been limited to doing empirical measurements.

However, I was hoping to read a little more about the redwood b-tree structure, but this documentation appears to be still missing. Any idea when it would be added, or where I can read more information?

1 Like