We have a “reset” operation that would drop the existing subspace by deleting directory, then create a new directory with the same path. Occasionally, we noticed left over data based on key-value size in status, but the space associated with new directory is empty. This seems more likely to happen with large dataset (150m K/V entries within the subspace, for example), or when the cluster is still under rebalancing.
We observed the issue during multiple iteration of batch loading of same dataset, which always “reset” the directory before the loading. The sum of key-value size increased a lot after each run. So there are couple questions:
- Does the directory layer assign same prefix to one directory after it been recreated? Guess the answer is no, or we should see left over entries.
- When the directory been delete, would the K/V entries been removed within the subspace synchronously, or this is a back ground job?
- We did wait for the directory been removed in API before loading dataset again.We also tried waiting long enough (over 10 hours) after the loading job, but there was no change of key-value size, which was still way over dataset volume. If the subspace was removed asynchronously, what could prevent entries been removed?