Range returned by `Subspace#range` are inclusive on both ends


I’ve noticed that Subspace#range returns a range that is in fact inclusive on both ends

    var s = new Subspace(new byte[] {(byte)0xf0});
    var r = s.range();
    System.out.println(hex(r.begin)); // 0xf000
    System.out.println(hex(r.end)); // 0xffff

This is made clear enough by the Javadoc

Gets a Range respresenting all keys strictly in the Subspace .

However, I found that the documentation for ReadTransaction#getRange is quite misleading:

Gets an ordered range of keys and values from the database. The begin and end keys are specified by byte[] arrays, with the begin key inclusive and the end key exclusive. Ranges are returned from calls to Tuple.range() and Range.startsWith(byte[]).

Yes, this says that the end is exclusive, but it also suggests that you can get a whole subspace with transaction.getRange(subspace.range()), but in fact, this will not read the very last key (0xf0ff in this case). Why is this so? Should it be more explicit that this pattern does not work exactly as expected? It seems natural to use it to iterate over a whole subspace, (and in fact I’ve found it at multiple places in our codebase…), and it’s unfortunate that doing so in fact misses the last key

This is indeed interesting

fdb> set \x00\x00 1
Committed (349243765927)
fdb> set \x00\x01 2
Committed (349246846354)
fdb> set \x00\xff 3
Committed (349250806671)
fdb> set \x01 4
Committed (349440253516)
fdb> getrange \x00\x00 \x00\xff  <- the end the range returns

Range limited to 25 keys
`\x00\x00' is `1'
`\x00\x01' is `2'

fdb> getrange \x00\x00 \x01 <- the proper end to use?

Range limited to 25 keys
`\x00\x00' is `1'
`\x00\x01' is `2'
`\x00\xff' is `3'

However, I think if you always use the tuple encoding for your keys, this (first byte inside a subspace being xFF) cannot happen, since the first byte is the data type of the first tuple item, and there is no such mapping:

private static final byte nil                   = 0x00;
private static final byte BYTES_CODE            = 0x01;
private static final byte STRING_CODE           = 0x02;
private static final byte NESTED_CODE           = 0x05;
private static final byte INT_ZERO_CODE         = 0x14;
private static final byte POS_INT_END           = 0x1d;
private static final byte NEG_INT_START         = 0x0b;
private static final byte FLOAT_CODE            = 0x20;
private static final byte DOUBLE_CODE           = 0x21;
private static final byte FALSE_CODE            = 0x26;
private static final byte TRUE_CODE             = 0x27;
private static final byte UUID_CODE             = 0x30;
private static final byte VERSIONSTAMP_CODE     = 0x33;
1 Like

Ideally the Subspace.range() method should have used something like this

I think this is the key misunderstanding here. The key \xf0\xff should not be placed into the subspace.

Subspaces are meant to contain the same keyspace that you can store if you are not in a Subspace. If you are not in a Subspace, then keys starting with \xff are not allowed as that is the reserved system space of FDB. While keys could start with \xff after any non-empty prefix, it’s not a supported use case.

This is why the range for a Subspace with prefix P which includes all keys in the subspace is <P> to <P>\xff. From the Subspace perspective, this range is inclusive of all keys expected to be in the Subspace. The raw range returned to use with the raw key getRange() is exclusive of the end key, but this does not exclude any expected keys in the Subspace because keys which begin with \xff after the prefix are not a supported use case.