I am clearing some keys with a big range. approximately 1000 gb. The clear operation for every key is taking around 15 milliseconds. which is quite high because I have 170 million keys and it takes around 2 weeks to clear all the keys. Any help on how to make it faster?
Delete them in a single clear(Range) operation, if the keys form a continuous range.
If not, delete them in parallel, e.g. partition the huge key list to 100 smaller list, start 100 threads and process each sub-list concurrently.
The bottleneck you have is not the actual deletion but the commit pipeline. In addition to what @Imperatorx said, you probably should have more than one clear in each transaction. That way you pay less for the commit (though you don’t want to make the transaction huge either – I would probably try to make each transaction ~10-100KB in size – you can estimate the size by adding adding up the size of the key ranges)
Yes, I am deleting them in a single range operation but its difficult to track how much data we are clearing when deleting a range. I think there should be a way to track how much data for range too. Previously I was adding len(key.bytes) for each key deletion.
You can estimate the size in bytes of data within a key range using the