Directory recreate and left over data

randyhu · November 1, 2021, 4:29pm

We have a “reset” operation that would drop the existing subspace by deleting directory, then create a new directory with the same path. Occasionally, we noticed left over data based on key-value size in status, but the space associated with new directory is empty. This seems more likely to happen with large dataset (150m K/V entries within the subspace, for example), or when the cluster is still under rebalancing.

We observed the issue during multiple iteration of batch loading of same dataset, which always “reset” the directory before the loading. The sum of key-value size increased a lot after each run. So there are couple questions:

Does the directory layer assign same prefix to one directory after it been recreated? Guess the answer is no, or we should see left over entries.
When the directory been delete, would the K/V entries been removed within the subspace synchronously, or this is a back ground job?
We did wait for the directory been removed in API before loading dataset again.We also tried waiting long enough (over 10 hours) after the loading job, but there was no change of key-value size, which was still way over dataset volume. If the subspace was removed asynchronously, what could prevent entries been removed?

Thanks,

Randy

Topic		Replies	Views
Can subspace/dir be used unlimited (modelling) and about multi client Using FoundationDB	3	1100	July 3, 2018
Questions about the Directory partitions Using FoundationDB bindings	4	867	January 25, 2021
KeySpace and FDB Directory Layer Record Layer	3	481	October 12, 2022
Disk space used not going down after clearrange Using FoundationDB	6	1358	June 5, 2020
Finding out the size of subspace in foundationdb Using FoundationDB	1	341	January 4, 2023

Directory recreate and left over data

Related topics