Background disk IOPS after large clearrange

larshagen · February 12, 2020, 9:32am

We have been running performance tests with FDB in GCP, and we clear all data between tests using clearrange. We use the standard ssd storage engine. We have noticed that after clearing a sizable data set (several TB), there is sustained disk IOPS for many hours after clearing the data, even though there is no more reads or writes from the outside. I assume this is related to reclaming deleted pages of the B-tree backend. Will this background load affect performance when we add new load, and is there a way to monitor this activity to able to separate it from normal reads/writes coming from actual workload? Will this behaviour change with redwood?

alexmiller · February 12, 2020, 6:07pm

This issue has been raised before: Large clear range performance

It seems to manifest stronger on cloud providers than on bare metal.

larshagen · February 12, 2020, 7:16pm

Thanks for replying!

Yes, I saw the other post, though in that case it manifested as saturating CPU, while I don’t see high CPU, but I do see high read activity (~3k read iops) for a long time after the the range clear.

Could I monitor this by looking at the spring cleaning metrics in the trace logs?

SteavedHams · February 12, 2020, 11:10pm

Yes, SpringCleaningMetrics logs metrics about background cleanup operations. LazyDelete refers to tree traversals to free pages from cleared ranges and Vacuuming refers to moving pages away from the tail of the file to free tail space so the file can be truncated. When both of those activities have gone to (or near) 0, the delayed cleanup is done.

SteavedHams · February 12, 2020, 11:19pm

I’d also like to add that if the background cleanup speed is of concern to you then you’ll be happy to know that the upcoming Redwood storage engine will be at least an order of magnitude less I/O in this regard, possibly over 100x depending on key patterns.

There are some slides here with more details about Redwood’s I/O behavior: FoundationDB Summit 2019: Redwood Storage Engine Update

larshagen · February 13, 2020, 10:26am

Thanks, Steve!

Will keep an eye on the SpringCleaningMetrics then, and hopefully this will only be a problem in my synthetic testing, where I delete large data sets to prepare for the next test.

Looking forward to redwood, our tests showed a nice speedup (~50%) on write performance when testing the experimental version, and parallel cache misses is likely to be very important when using kubernetes persistent volumes with higher latency than bare metal disks.

Topic		Replies	Views
FoundationDB cluster performance issue - Periods of high disk I/O and sustained high latency Using FoundationDB performance	21	2574	July 6, 2020
Cannot clear all keys by "clearrange \x00 \xff" Using FoundationDB	15	2973	July 3, 2022
Large clear range performance Using FoundationDB performance	8	1694	December 28, 2022
Used disk space dramatically increases while sum of key-value sizes is constant Using FoundationDB	5	2092	September 4, 2018
Data retention after deleting a key range using SSD engine Using FoundationDB	8	1379	June 22, 2020

Background disk IOPS after large clearrange

Related topics