Wrapper storage engine for memory + SSD

How difficult would this be? Is it just a matter of creating a new IKeyValueStore implementation which holds an instance of both existing storage engines and passes through the method calls based on e.g. a key prefix? This makes recovery from a cold start degrade to the speed of reading all the memory engine logs + snapshot off disk, but that is expected.

There are two use cases I’m thinking of primarily here:

  1. Ingesting event or log data. If it were to land in the memory engine first, it would be entirely sequential writing and no reading during normal operation, ideal for the large capacity, cheap block storage available in the cloud. Data would be batched into larger chunks and moved elsewhere (such as the SSD engine or directly to S3). This is similar to what Wavefront does today from what I understand, except with two different clusters. A new engine is not strictly necessary for this use case, but it would simplify it.

  2. Change data capture for data that lives in FDB already. If you could write changes to existing records into the memory engine instead of spending double on IO that you’d do with the SSD engine. The change log is immutable and can be written out to S3 like in the prior example, or heavily compressed by a layer before writing back to the SSD engine.

In both cases, I envision the memory portion being used as a temporary place where data will be stored for a short period and then deleted. There are surely other use cases I haven’t thought of. This does require that the new storage engine be able to handle transactions which span the storage engines.

I think a single configuration option is enough to make it usable. In the configuration for your storage processes, you choose to use this storage engine by name. Then you supply a prefix where your memory engine will be mounted, leaving everything else as the SSD engine. You would be not be allowed to change this at runtime, and all storage processes would have to use the same value. The system keys would only ever be stored on the SSD engine. An easy default could be the memory storage engine runs at prefix \xFE, so anyone using the existing layers that come with the bindings would need to explicitly choose to use this feature, like creating a directory layer which lives inside \xFE.

Another reason to do this is that it has the potential to delay the need to adapt or write an LSM engine from scratch, which I’m sure some people want just for the capacity of absorbing lots of small writes without requiring a ton of write IOPS.

Thanks again to the team for indulging my questions and suggestions. :blush:

1 Like

I don’t think one would want to do this as part of a single IKeyValueStore instance? Wouldn’t it be more straight forward to do this as a configurable storage engine by key range?

The high level implementation would be something like

  1. Add support for fdbcli> configure memory from \x03 to \x04
  2. Data distribution should not create shards that span across storage boundaries
  3. Shards get assigned to storage processes running the engine type that they’re configured to be in

@Evan and @markus.pilman have been chatting about adding a consistent read cache for a configurable key range to FDB, which would probably have some overlap with the required work to have the storage engine configurable by key range.

1 Like

That does sound better overall. I was trying to think of the minimum change that would work but being configurable and under the control of data distribution would be great.

Supporting different storage configurations (not just engine type, but replication configuration, and ideally even asymmetric replica sets like (memory, ssd, ssd)) for different key ranges is definitely the Right Approach, and has been on my wish list for ages.

A “wrapper IKeyValueStore” might be a little bit easier to write, but would probably be a mess to administer long term - how do you ever change the configuration, for example?

Ryan captured my use cases pretty well. As for configuration of a wrapper store, I was thinking of essentially creating a new unique, deterministic name (hashed) for each unique configuration. This would allow you to add new ones at runtime that get written into the system table. Then you configure them as you would any other storage engine but it would have something like hybrid_[hex code] for the name. That would allow them to be dynamically changed at runtime much like you would switch between memory and ssd. Wouldn’t the system then be able to do the migration for you?