Atomic_Add(key, -1) can "leak" keys with value 0 if you are not careful

(Christophe Chevalier) #1

One of the issue we had in the past, was with atomic ADD leaking values decremented to 0, which was a common issue when updating static counters used in indexes.

Say we have an index that adds (..., 0, VALUE, ID) = '' for each indexed document, and also atomically increments a counter like (..., 1, VALUE) = (64 bits counter) when a document is indexed for VALUE, and decrements it when the document is later mutated to some other value.

The issue was that decrementing a value down to 0 would keep the key present in the database.

  • For “natural” indexes, or ones with a small pool of values that have a high probability to be reused again, this may not be an issue.
  • For indexes on “time based” or random-ish fields this is an issue: if the value is a timestamp, or some transient ID that will never be reused, we would leak a ton of (..., TIMESTAMP) = 00 00 00 00 00 00 00 00 in the dataset which could consume a lot of space in the long run (I had to scrub these multiple times, and they would amount to more than 3/4 of the total data size in one extreme case).

Has there been any thing done about this case? I remember asking if maybe decrementing the key down to 0 could maybe also remove it. Or maybe add having a COMPARE_AND_CLEAR atomic operator which would only delete a key if it its value was equal to the argument.

ie: in pseudo code:

SET((..., 0, VALUE, ID), '')
ATOMIC_COMPARE_AND_CLEAR((..., 1, VALUE), <00 00 00 00>)

I had to do some custom logic which would automatically disable this feature when the attribute was of type DateTime or Guid, but this was a bit clunky.

(Andreas Smas) #2

I don’t have much to add except that I’ve also identified this problem and is interested in a solution.

(Evan Tschannen) #3

Adding a new atomic operation that will delete if the value becomes zero should be fairly simple to add. This could be a good first task for someone looking to contribute to FoundationDB. Please create an issue in GitHub for this request.


(David Scherer) #4

I like “compare and clear” as nice and general, but I also wouldn’t argue too strongly against “add and clear if zero” as a further minor optimization.

I suppose the limit of this integral is ATOMIC_EXECUTE_LUA.

(Christophe Chevalier) #5

I have opened issue as requested. I won’t have the time to work on this particular subject, so if someone wants to take it, please do !

I would love an ATOMIC_EXECUTE_LUA ! It would work great in conjunction with ATOMIC_EXECUTE_SQL_QUERY :slight_smile: