Use KeySelector.firstGreaterThan or just skip one KeyValue during iteration?

Hi! I wonder what is faster, using tx.getKey with KeySelector.firstGreaterThan(begin), or just use tx.getRange(begin, end) and then skip one KeyValue during iteration?

Certainly getRange() with a KeySelector on begin rather than tx.getKey and then passing it to getRange (if that’s your ultimate goal). getKey() requires a round-trip.

My goal is to get a range from Tuple.from("x", begin).pack (exclusive) to Tuple.from("x").end (inclusive)

Oh, so its

  KeySelector.firstGreaterThan(Tuple.from("x", begin).pack), 

End is exclusive for getRange(), so you need to say firstGreaterThan() for the end if you want it to be inclusive.


Right. It works, thank you! :slight_smile: Still wrapping my head around keys and ranges.

1 Like

As to your original question, efficiency wise, using the key selector or doing a range query and then skipping (as appropriate) is roughly the same. (Well, you don’t have send the bytes from the skipped key value pair over the wire, but disk I/O wise it’s the same and network I/O it’s almost always the same.) If you can use key selectors, that’s probably better (given that you save network I/O), but there aren’t behind-the-scenes savings at, say, the storage server level.

I the think a good rule of thumb is if you need to seek and then know the key that you landed on (which I find is usually the case), then you need to use getRange, and if you just need to know the value, then you can use getKey. One other caveat is that getKey doesn’t offer any kind of bounds checking while getRange can. (E.g., if you have a key k in a subspace s and want to find the first key greater than k but still in subspace s, you need to do something like tr.getRange(KeySelector.firstGreaterThan(k), KeySelector.firstGreaterThan(s.range().end), limit=1).) Another rule of thumb that will probably get you most of the way there most of the time is to always use getRange.

EDIT: getKey to getRange to make my response almost make sense

For what it’s worth, by default getKey is currently implemented by issuing a range read request to the storage server anyway. I think that if your transaction has disabled read your writes, though, it would issue a get key request instead.

Ah, I mistook the original question as to whether to call getKey() first and pass it to getRange() vs. doing it just with getRange().

My goal is to get a range from Tuple.from(“x”, begin).pack (exclusive) to Tuple.from(“x”).end (inclusive)

If I understand you correctly, neither of the proposed solutions is right for this goal, because the key Tuple.from("x", begin).pack might not be present in the database!

What you want to do is to start your range read from the next possible key, which is accomplished by adding a zero byte to the end of the key. I think this is available in many bindings with a name like “keyAfter()” but in a pinch you could write it yourself.

Since you say you want the end of the range to be inclusive, you should do that to the end key as well.

Using a firstGreaterThan key selector as the begin argument to the range read accomplishes this, right? And using firstGreaterThan(x) at the end means the first key excluded is the next key after x, so it would be inclusive of x.


Yes, you are right, that is exactly the same. My bad. And maybe the implementation actually optimizes getRange(firstGreater(x),…) to getRange(keyAfter(x),…) anyway, so that there is no efficiency difference?

keyAfter() seems to only be defined by the flow binding:

Is this something that is frequent enough to justify being exposed by other bindings?

What about keyBetween(..) which seems to only be used by the StorageMetrics actor ?

I think it’s a useful utility function that wouldn’t be out of place in a binding, particularly if appending a null byte to a key is syntactically ugly. It also is perhaps a little more obvious to the reader what it’s going compared to doing the append by hand.