We have been running performance tests with FDB in GCP, and we clear all data between tests using clearrange. We use the standard ssd storage engine. We have noticed that after clearing a sizable data set (several TB), there is sustained disk IOPS for many hours after clearing the data, even though there is no more reads or writes from the outside. I assume this is related to reclaming deleted pages of the B-tree backend. Will this background load affect performance when we add new load, and is there a way to monitor this activity to able to separate it from normal reads/writes coming from actual workload? Will this behaviour change with redwood?
This issue has been raised before: Large clear range performance
It seems to manifest stronger on cloud providers than on bare metal.
Thanks for replying!
Yes, I saw the other post, though in that case it manifested as saturating CPU, while I don’t see high CPU, but I do see high read activity (~3k read iops) for a long time after the the range clear.
Could I monitor this by looking at the spring cleaning metrics in the trace logs?
Yes, SpringCleaningMetrics logs metrics about background cleanup operations. LazyDelete refers to tree traversals to free pages from cleared ranges and Vacuuming refers to moving pages away from the tail of the file to free tail space so the file can be truncated. When both of those activities have gone to (or near) 0, the delayed cleanup is done.
I’d also like to add that if the background cleanup speed is of concern to you then you’ll be happy to know that the upcoming Redwood storage engine will be at least an order of magnitude less I/O in this regard, possibly over 100x depending on key patterns.
There are some slides here with more details about Redwood’s I/O behavior: FoundationDB Summit 2019: Redwood Storage Engine Update
Thanks, Steve!
Will keep an eye on the SpringCleaningMetrics then, and hopefully this will only be a problem in my synthetic testing, where I delete large data sets to prepare for the next test.
Looking forward to redwood, our tests showed a nice speedup (~50%) on write performance when testing the experimental version, and parallel cache misses is likely to be very important when using kubernetes persistent volumes with higher latency than bare metal disks.