One of the issue we had in the past, was with atomic ADD leaking values decremented to 0, which was a common issue when updating static counters used in indexes.
Say we have an index that adds (..., 0, VALUE, ID) = ''
for each indexed document, and also atomically increments a counter like (..., 1, VALUE) = (64 bits counter)
when a document is indexed for VALUE, and decrements it when the document is later mutated to some other value.
The issue was that decrementing a value down to 0 would keep the key present in the database.
- For “natural” indexes, or ones with a small pool of values that have a high probability to be reused again, this may not be an issue.
- For indexes on “time based” or random-ish fields this is an issue: if the value is a timestamp, or some transient ID that will never be reused, we would leak a ton of
(..., TIMESTAMP) = 00 00 00 00 00 00 00 00
in the dataset which could consume a lot of space in the long run (I had to scrub these multiple times, and they would amount to more than 3/4 of the total data size in one extreme case).
Has there been any thing done about this case? I remember asking if maybe decrementing the key down to 0 could maybe also remove it. Or maybe add having a COMPARE_AND_CLEAR
atomic operator which would only delete a key if it its value was equal to the argument.
ie: in pseudo code:
SET((..., 0, VALUE, ID), '')
ATOMIC_ADD((..., 1, VALUE), <FF FF FF FF>)
ATOMIC_COMPARE_AND_CLEAR((..., 1, VALUE), <00 00 00 00>)
I had to do some custom logic which would automatically disable this feature when the attribute was of type DateTime or Guid, but this was a bit clunky.