Large clear range performance

andrew.noyes · February 20, 2019, 11:15pm

Hello,

We see performance problems when clearing large ranges in foundationdb3. We see many storage nodes get pegged at 100% cpu, and perf top shows they’re spending their time in walMerge in sqlite. We’re running foundationdb3 on EBS.

A couple questions

Is this improved in foundationdb6?
Would it make sense expose a throttled clear range in the API? We’re imagining something like issuing a special clear range with a budget for how much work it can do per storage node. This would make sense in the directory layer, where clients would know that a directory was removed and the actual work of clearing the contents of the directory can be deferred.

gaurav · February 21, 2019, 1:40am

I have been using fdb6, and have cleared almost half a terabyte in a single range clear (on multiple occasions), without seeing the cpu problems you mentioned.

aseipp · February 21, 2019, 2:34am

Just putting in another vote for FDB 6 here: I’ve done 400GB range clears extremely quickly, in < 500ms or so (IIRC). The important bit is that the actual on-disk storage will take longer to clean up in the background, but the key-value ranges themselves are cleared extremely quickly either way.

mengxu · February 21, 2019, 7:03am

@aseipp @gaurav Are you running on EBS or bare metal?

gaurav · February 21, 2019, 8:04am

I have tried this on EBS (gp2 ssd - with around 3K iops provisioned); as well as on ESX virtualized storage env. (vmware).

theos · May 22, 2019, 10:51am

I had the very same issue with latest FDB on a big setup (3Tb raw data, 9Tb disk used) when clearing half of the database during the load: I had bunch of timeouts for short period of time (seconds) and spike in disk usage up to 100% for long period of time (had to decrease db load to get back from 100% usage)

mengxu · May 23, 2019, 6:37pm

Before you performed the clear operation, what is the average and maximum disk usage?

The default storage engine uses sqlite, which will truncate files when the disk usage is above a threshold. This background job can temporarily increase disk usage further (which is the reason why disk usage spikes during cleanup) and slowdown the storage server by eat up a lot of CPU (which is the reason why you may see timeout).

sandeep-nallapu · December 22, 2022, 6:06pm

I am clearing some keys with a big range… approximately 1000 gb. The clear operation for every key is taking around 15 milliseconds. which is quite high because I have 170 million keys and it takes around 2 weeks to clear all the keys. Any help on how to make it faster?

andrew.noyes · December 28, 2022, 9:47pm

It sounds like you’re clearing one key per transaction, and you have few commits in flight at a time. Try clearing multiple keys per transaction (sharing the cost of starting and committing transactions), and also have multiple commits in flight to increase throughput. See throughput requires concurrency

Topic		Replies	Views
Fdb taking around 15 ms to clear 1 single key Using FoundationDB performance	4	413	January 2, 2023
Cannot clear all keys by "clearrange \x00 \xff" Using FoundationDB	15	2946	July 3, 2022
Clear all Data but disk space utilization still same Using FoundationDB	6	652	February 10, 2023
Used disk space dramatically increases while sum of key-value sizes is constant Using FoundationDB	5	2060	September 4, 2018
FoundationDB cluster performance issue - Periods of high disk I/O and sustained high latency Using FoundationDB performance	21	2533	July 6, 2020

Large clear range performance

Related topics