FoundationDB

Ranges without explicit end (Go)

bindings

#1

It’s unclear from the documentation, particularly those for the Go bindings, how you can construct a selector range which doesn’t have an explicit end.

For example, assume that tuples are {"age", <n>}. You can easily do a search starting at the beginning:

result := tx.GetRange(fdb.SelectorRange{
  Begin: fdb.FirstGreaterThan(tuple.Tuple{"age"}),
  End: fdb.LastLessThan(tuple.Tuple{"age", 100},
}), fdb.RangeOptions{})

This will find all records from the lowest age up to {age, 99}. However, it’s unclear to me how to create an “open” range (i.e. corresponding to SQL age > 30). Frequently in the documentation \xFF is used as a placeholder, but this doesn’t work:

result := tx.GetRange(fdb.SelectorRange{
  Begin: fdb.FirstGreaterThan(tuple.Tuple{"age"}),
  End: fdb.LastLessThan(tuple.Tuple{"age", "\xFF"},
}), fdb.RangeOptions{})

Using End: fdb.LastLessThan(fdb.Key("\xFF")) does work, but this obviously spills into other, unrelated tuple prefixes.

I could also just use a synthetic maximum: fdb.LastLessThan(tuple.Tuple{"age", 2147483648}). I would like to avoid that.

(It’d be nice if Begin: nil and End: nil actually were valid offsets.)


(Christophe Chevalier) #2

There really needs to be a big red warning on top of the documentation about tuples and tuple encoding, because I see a lot of people falling victim of the same pitfall:

  • Strings, in the tuple encoding, are escaped as 0x02 followed by the utf-8 bytes, and terminated by a NUL byte, so "\xFF" is not the FF byte (a single byte), but the string composed of a single character with code point 255, which translates to 02 FF 00 (well not really, code point 255 would be encoded by UTF-8 into some sequence of bytes that I don’t remember off the top of my head, but you get the point).
  • Integers use a smart trick to preserve ordering, and 0 is encoded as 14, and small integers (<255) are encoded as 15 xx, then 16 xx yy for 2-byte integers and so on.

So, the tuple {…, “\xFF” }, once transformed into bytes by the tuple encoding, represents a key that is actually BEFORE the tuple {…, 1} ! (... 02 FF 00 < .... 15 01).

If the end key selector is less than the begin key selector, the set is by definition empty and your get_range will never return anything! :slight_smile:

If you want to generate a byte sequence that will always be greater than whatever value (or type!) you could fit in the last element for the age value, you would need to provide of a custom token that would be recognized by the tuple encoder as a special value that outputs as a single FF byte. So not a string, integer, or any other common type. I don’t know if the go implementation has this special singleton or not…

And of course, you could also use a different encoding scheme than the tuple encoding (which is used by default, but is not the only one possible!) and you’d still want to be able to do that.

If you want a generic way to create such an open ended range, there are two ways:

  1. add a single FF byte at the end of the result of packing the tuple (i.e: add it to the resulting byte array, not the tuple’s items). So encode the tuple {"age", } into bytes (will output something like 02 a g e 00) and append to that the FF byte (to get 02 a g e 00 FF). With most tuple encoders, this would NOT be a valid encoding (since FF is not a valid type header). Other encoders may accept the FF anyway, depends on the binding`s implementation.

  2. increment the last byte of the key prefix, so for our tuple above, ‘02 a g e 00’ + 1 = 02 a g e 01. Again, this is NOT a valid tuple encoding (un-terminated string). If the last byte is already FF then carry the one like regular addition in base 10.

But in both cases, all possible tuples { "age", <some_int> } are less than both 02 a g e 00 FF and 02 a g e 01.

The first solution works for tuple encoding, because it NEVER* produces with items that starts with FF (*: some do! :slight_smile: ), but other encodings MAY NOT, so solution 2. is universal, while 1. is a bit easier to do in practice and works fine with tuples.

I think in most bindings, there is a method called increment (or with increment in the name) that does that for you.


Edit: so yeah, if the go binding would introduce some singleton of a custom type (let’s call it ALEPH_0), that would be encoded as FF then you could write fdb.lastLessThan(tuple.Tuple("age", ALEPH_0)) to get what you want. This would not be a canonical encoding with the current spec, though. In the mean time, when dealing with integers, you can cheat a little and think of 2147483648 as the best approximation of ALEPH_0 we can use right know :slight_smile: )


(Alec Grieser) #3

Firstly, ℵ0 would be a pretty chill constant.

It looks like the go bindings are missing that increment method, and in fact, both the tuple and subspace layers re-implement a concat method to concatenate methods together. I filed an issue about this: https://github.com/apple/foundationdb/issues/846


#4

Ack. Thanks for the detailed explanation, this clears up a few things!

I guess I’m somewhat tripped up by how FoundationDB’s client API is so low-level, or rather that the tuple encoding seems to be an explicit part of the API, something you wouldn’t see with, say, PostgreSQL. I understand the necessity, to an extent, but the Go API (which, being Go, doesn’t distinguish between bytes and strings) doesn’t make it easier.

A special singleton to represent the FF would certainly help.


(Christophe Chevalier) #5

The Tuple encoding is indeed maybe too much coupled with some bindings… It is such a useful tool because it allows you to think about your data in term of vectors, instead of raw bytes, that after a while you start forgetting that it is only a convention on how to encode keys. But sometimes you have to come back down to earth and remember that the API is very low level indeed and that in the end you can decide to encode things however you like.

I tried to solve this issue with the .NET binding (ability to have custom encodings and type systems) but the complexity goes up to 11 immediately. You are now introducing encoding schemes, encoders/decoders, dynamic vs statically typed, fixed vs arbitrary length, vector manipulation, negative indexing, tuple deconstruction, etc… You really need to already have a good grip on what’s going on under the hood (at the byte level), before being able to understand why you need a complex type system like this.


(A.J. Beamon) #6

The Tuple APIs should all support querying a Tuple range that will return the sub-elements of a Tuple (e.g. for your tuple {"age", <n>}, you could query all elements starting with {"age",}. Note that this range excludes the prefix tuple itself, so if you have a key {"age"}, it won’t return that.

In Go, you would do this with https://godoc.org/github.com/apple/foundationdb/bindings/go/src/fdb/tuple#Tuple.FDBRangeKeys. If you want to perform a query like age > 30, then you could probably create a new range that sets the begin key to be fdb.FirstGreaterThan(tuple.Tuple{"age",30}) and the end key to be the one extracted from the Tuple range.


#7

and the end key to be the one extracted from the Tuple range.

What do you mean by “the one extracted from the Tuple range”? Determining what the end key should be is the point of my question. The above comment already provides the solution, by the way.


(A.J. Beamon) #8

There’s a function in the API already to produce a Tuple range (see the link in my comment) which is intended to make it easy to do the query you were trying to perform. If you wanted only one half of that range (e.g. the end key), then you could create the range and grab the end key out of it.


#9

Still not sure what you men by “grab the end key out of it”. Do you mean to actually query FDB to find the last key, then plug that into the range query? I’d want to avoid the round trip.

You shouldn’t need to ask FDB for keys in order to do an “open” range query. All you want is either the key that sorts lexicographically after the highest possible key (so tuple.Tuple{"age"} + 1 or tuple.Tuple{"age", ALEPH_0}, as @KrzysFR explained. My problem was that I was using 0xFF (the ALELPH_0) inside the tuple encoding instead of outside it.


(Christophe Chevalier) #10

I think @ajbeamon meant to call FDBRangeKeys() and use the second element of the pair that is returned, as the key of your end selector.

Key selectors can point to non-existent keys in the database, so you don’t need to actually do a read to get the last key, and use that for the range. It is most probable that the FDBRangeKeys helper either appends a 0xFF byte, or increment the whole key, so it does what you want already.


(A.J. Beamon) #11

Here is an example to demonstrate what I mean:

_, e := db.Transact(func(tr fdb.Transaction) (interface{}, error) {
	tr.Set(tuple.Tuple{"age", 30}, []byte("test"))
	tr.Set(tuple.Tuple{"age", 31}, []byte("test"))
	return nil, nil
})

if e != nil {
	log.Fatalf("Unable to perform FDB transaction (%v)", e)
}

fmt.Printf("All ages:\n")
db.Transact(func(tr fdb.Transaction) (interface{}, error) {
	t := tuple.Tuple{"age"}
	r := tr.GetRange(t, fdb.RangeOptions{}).GetSliceOrPanic()

	for _, kv := range r {
		t, _ := tuple.Unpack(kv.Key)
		fmt.Printf("%d = %s\n", t[1].(int64), string(kv.Value))
	}

	return nil, nil
})

fmt.Printf("\nage > 30:\n")
db.Transact(func(tr fdb.Transaction) (interface{}, error) {
	t := tuple.Tuple{"age"}
	_, endKey := t.FDBRangeKeys()

	r := tr.GetRange(fdb.SelectorRange{
		fdb.FirstGreaterThan(tuple.Tuple{"age", 30}),
		fdb.FirstGreaterOrEqual(endKey),
	}, fdb.RangeOptions{}).GetSliceOrPanic()

	for _, kv := range r {
		t, _ := tuple.Unpack(kv.Key)
		fmt.Printf("%d = %s\n", t[1].(int64), string(kv.Value))
	}

	return nil, nil
})

And the result:

All ages:
30 = test
31 = test

age > 30:
31 = test

#12

Go Bindings allow you to use just a key prefix to create a range for all keys with that prefix:

 myRange, err := fdb.PrefixRange(prefixKey)
 myRangeIterator := tr.GetRange(myRange, fdb.RangeOptions{}).Iterator()
 for myRangeIterator.Advance() {
     kv := myRangeIterator.MustGet()
     // some code
 }

This should solve your problem.


(A.J. Beamon) #13

This is very similar to the example using a Tuple as a range with a couple differences:

  1. The beginning of the range is inclusive of the prefix, whereas with a tuple range the prefix tuple is itself excluded.
  2. The end key is equal to the begin key with the last byte incremented. This means that every key with the given prefix will be included. With tuple ranges, the range end will be larger than any valid packed tuple with the given tuple prefix, but there are some keys which are not valid tuples that have that prefix and would be excluded.

You can also do the same thing with PrefixRange that I described with t.FDBRangeKeys() if you want to customize the beginning of your range. Specifically, you could create a SelectorRange with your desired beginning and the end copied from the PrefixRange. Or, you could just use Strinc directly to create your end key (this function is apparently not documented, but it returns the first key that doesn’t have the specified key as a prefix).

If you are using exclusively tuples as your keys in the range being queried, then passing in your Tuple as the range is likely the more straightforward approach. If you aren’t using tuples or if you have a mixture of keys that includes some that aren’t tuples, it’ll probably be safer to use the PrefixRange.