Relax consistency guarantees

Can you try to debug this? Would be interesting to know what bottleneck you’re hitting there. As SS is spiking, I assume the TLogs are fine. It would be awesome if you could try the following:

  1. Replace all clear and clearrange mutations with set mutations. If you hit the same issue, it will tell us that you simply run into memory pressure issues with the SS.
  2. Replace your many small clear range mutations with few large ones. Not sure how easy that will be in your workload - but generally I would assume that this would increase performance significantly.
  3. Do you know how the CPU utilization looks like when SS goes up? And what about disk?
  4. If CPU is very high during these periods you can attach perf to the storage during one of these spikes. You can do perf record -p PID -g. You can then post the results here.