Limiting the cardinality of a key range


Sorry if this should be obvious, but I was wondering if there was a preferred mechanism for limiting a key range to some cardinality (i.e. “messages/* should have no more than 10,000 keys”).

The documentation points out that offsets are O(n) complexity, so it seems expensive to do a clear(start + 10,000, end) with every transaction affecting that key range.

Does this becomes easier if the constraint is relaxed? (i.e. “messages/* should not have significantly more than 10,000 keys”).

(A.J. Beamon) #3

If you want to support limiting the size of a range while supporting concurrency, you’ll probably need to rely on atomic operations to keep track of its size. There is some discussion about that here:

Depending on the nature of your writes, if you are willing to enforce this at write time (rather than by clearing keys after having exceeded the limit), you could disallow writes to messages when the counter exceeds 10,000. If you have lots of concurrent writers and/or if they are each writing lots of keys to “messages”, then the counter could exceed this limit by a sizable margin.

If you instead want to be able to estimate where the 10,000th key is and then clear all keys after that, then you could try something like what Dave described in the link above called the ranked set. Alec provides a more detailed description here:

And there’s an implementation of the idea here in C#, though I haven’t looked at it to see how closely it matches the previous descriptions: