What is the status of storing extra copies of hot key ranges in memory?

This was in the 7.0 release planning document on the forums:

Storage cache - (Neelam, Xin) - One of the biggest causes of cluster unavailability for FoundationDB are hot read shards. When users drive a lot of read traffic to a small keyspace, the cluster cannot distribute the load to more than a few servers. The goal of this project is to automatically detect these hot read shards, and have a large number of stateless processes (storage cache) start serving reads for this key range. The storage cache role will pull mutations from the TLogs, so it will be able to provide a consistent answer for these reads immediately after an initial fetch of the range.

What is the status of this feature?

The storage cache role is not complete and there are no plans to work on it any time soon as far as I’m aware.

FWIW, I think a better direction to serve the same need would be:

  • An ability to increase the replica count for specific shards
  • The additional replicas would be assigned to a special pool of “caching” StorageServers
  • The caching StorageServers are just normal storage servers except DataDistribution ensures that only a small number of shards are assigned to them such that they are able to fit all or most of their shard data in RAM
  • The auto-detection of shards in need of more replicas can still be used here

My main argument for this model is that if you are in a mode where you need StorageCaches to handle your workload then StorageCache reboots will be very painful as they must reload their assigned shards from StorageServers to be useful again. Even modest cache of a few GBs is going to take some time to repopulate, and the situation will be far worse after a full cluster reboot since all StorageServer caches are cold as well.

While a StorageCache that does not write to disk and keeps all data in a memory-optimized data structure is faster than a StorageServer with the same amount of data, I think it’s operationally better to use potentially more caching StorageServers as described above to serve the same traffic in a more robust way.