Are there any 'likely problems' resulting in a storage role exceeding its 8GB RAM and being killed that folks can point me to?

Ratekeeper will throttle all the way to 0 normal (non-“system_priority_immediate”) transactions under the right conditions, such as more than 2 Storage Queues growing beyond something like 1.8GB.

Based on

We didn’t see any noticeable change/problem in our metrics for the storage process data or durability lags, or the log queue length.

it sounds like that did not happen here because the cluster was still working at some non-zero transaction rate despite all the StorageServer restarts caused by OOMs. This is not unexpected because Redwood is very fast at recovering from disk and returning to normal write throughput. The return-to-performance time will be especially short if the writes have high key locality.

As for the default memory configuration, it’s probably the case that little attention has been paid to the defaults by large scale FDB users for several years and the StorageServer memory usage outside of the storage engine’s page cache has grown. FDB does not currently limit its memory usage as a reaction to its current usage vs its budget, so it is essentially up to the user to set the cache-memory and memory limits.

  • The cache-memory option sets the page cache size for the Redwood and ssd-2 storage engines. They will reliably stick to this limit aside from an occasional tiny overage when too many page eviction attempts encounter temporarily pinned pages.
  • The memory option sets the total memory usage (specifically RSS, not virtual memory) for the process. For a Storage class process, this setting must be large enough to accommodate the sum of:
    • cache-memory
    • Storage engine memory usage aside from its page cache (such as temporary memory used by reads or pending writes)
    • StorageServer memory other than the storage engine memory listed above.
      StorageServer memory will vary based on its current user workload, shard movement activity, and logical data size.

I don’t think there is any documentation describing how to arrive at what memory should be relative to cache-memory, and in fact in the fleets I’ve been involved with we have updated these settings occasionally based on memory usage observations. As a general rule, I think setting memory to (1.5 * cache_memory + 4GB) would be a stable configuration.

1 Like