Use KeySelector.firstGreaterThan or just skip one KeyValue during iteration?

notxcain · May 8, 2018, 7:52am

Hi! I wonder what is faster, using tx.getKey with KeySelector.firstGreaterThan(begin), or just use tx.getRange(begin, end) and then skip one KeyValue during iteration?

panghy · May 8, 2018, 8:01am

Certainly getRange() with a KeySelector on begin rather than tx.getKey and then passing it to getRange (if that’s your ultimate goal). getKey() requires a round-trip.

notxcain · May 8, 2018, 8:07am

My goal is to get a range from Tuple.from("x", begin).pack (exclusive) to Tuple.from("x").end (inclusive)

notxcain · May 8, 2018, 8:10am

Oh, so its

getRange(
  KeySelector.firstGreaterThan(Tuple.from("x", begin).pack), 
  KeySelector.lastLessOrEqual(Tuple.from("x").end)
)

panghy · May 8, 2018, 8:11am

End is exclusive for getRange(), so you need to say firstGreaterThan() for the end if you want it to be inclusive.

notxcain · May 8, 2018, 8:14am

Right. It works, thank you! Still wrapping my head around keys and ranges.

alloc · May 9, 2018, 4:25am

As to your original question, efficiency wise, using the key selector or doing a range query and then skipping (as appropriate) is roughly the same. (Well, you don’t have send the bytes from the skipped key value pair over the wire, but disk I/O wise it’s the same and network I/O it’s almost always the same.) If you can use key selectors, that’s probably better (given that you save network I/O), but there aren’t behind-the-scenes savings at, say, the storage server level.

I the think a good rule of thumb is if you need to seek and then know the key that you landed on (which I find is usually the case), then you need to use getRange, and if you just need to know the value, then you can use getKey. One other caveat is that getKey doesn’t offer any kind of bounds checking while getRange can. (E.g., if you have a key k in a subspace s and want to find the first key greater than k but still in subspace s, you need to do something like tr.getRange(KeySelector.firstGreaterThan(k), KeySelector.firstGreaterThan(s.range().end), limit=1).) Another rule of thumb that will probably get you most of the way there most of the time is to always use getRange.

EDIT: getKey to getRange to make my response almost make sense

ajbeamon · May 9, 2018, 2:13pm

For what it’s worth, by default getKey is currently implemented by issuing a range read request to the storage server anyway. I think that if your transaction has disabled read your writes, though, it would issue a get key request instead.

panghy · May 9, 2018, 8:25pm

Ah, I mistook the original question as to whether to call getKey() first and pass it to getRange() vs. doing it just with getRange().

dave · May 11, 2018, 12:48pm

My goal is to get a range from Tuple.from(“x”, begin).pack (exclusive) to Tuple.from(“x”).end (inclusive)

If I understand you correctly, neither of the proposed solutions is right for this goal, because the key Tuple.from("x", begin).pack might not be present in the database!

What you want to do is to start your range read from the next possible key, which is accomplished by adding a zero byte to the end of the key. I think this is available in many bindings with a name like “keyAfter()” but in a pinch you could write it yourself.

Since you say you want the end of the range to be inclusive, you should do that to the end key as well.

ajbeamon · May 11, 2018, 2:45pm

Using a firstGreaterThan key selector as the begin argument to the range read accomplishes this, right? And using firstGreaterThan(x) at the end means the first key excluded is the next key after x, so it would be inclusive of x.

dave · May 11, 2018, 3:12pm

@ajbeamon

Yes, you are right, that is exactly the same. My bad. And maybe the implementation actually optimizes getRange(firstGreater(x),…) to getRange(keyAfter(x),…) anyway, so that there is no efficiency difference?

KrzysFR · May 11, 2018, 4:21pm

keyAfter() seems to only be defined by the flow binding:

github.com

apple/foundationdb/blob/ca720e15403bbe05a28a02b67d54b84bed9e09e3/fdbclient/FDBTypes.h#L264


};


typedef Standalone<KeyRef> Key;
typedef Standalone<ValueRef> Value;
typedef Standalone<KeyRangeRef> KeyRange;
typedef Standalone<KeyValueRef> KeyValue;
typedef Standalone<struct KeySelectorRef> KeySelector; 


enum { invalidVersion = -1, latestVersion = -2 };


inline Key keyAfter( const KeyRef& key ) {
	if(key == LiteralStringRef("\xff\xff"))
		return key;


	Standalone<StringRef> r;
	uint8_t* s = new (r.arena()) uint8_t[ key.size() + 1 ];
	memcpy(s, key.begin(), key.size() );
	s[key.size()] = 0;
	((StringRef&) r) = StringRef( s, key.size() + 1 );
	return r;
}

Is this something that is frequent enough to justify being exposed by other bindings?

What about keyBetween(..) which seems to only be used by the StorageMetrics actor ?

ajbeamon · May 12, 2018, 12:34am

I think it’s a useful utility function that wouldn’t be out of place in a binding, particularly if appending a null byte to a key is syntactically ugly. It also is perhaps a little more obvious to the reader what it’s going compared to doing the append by hand.

Topic		Replies	Views
Range-reading all key-values Using FoundationDB	2	2082	September 23, 2019
Latency of range queries that return large number of key-value pairs Using FoundationDB	9	2089	June 13, 2023
Understanding reverse argument of getRange API Using FoundationDB	2	869	October 21, 2019
Is it possible to fetch two different keys using "fdb_transaction_get_range" API in C program Using FoundationDB bindings	1	586	May 23, 2019
Another API for querying and KeySelector Using FoundationDB bindings	3	1151	January 31, 2021

Use KeySelector.firstGreaterThan or just skip one KeyValue during iteration?

Related topics