Directory.pack() & subspace interaction

Suppose we do a

set(directory.pack((“a_b_c”, “1”), ““)
set(directory.pack((“a_b_c”, “2”), ““)
set(directory.pack((“a_b_c_d”, “3”), ““)

then we do a subspace of directory.subspace(“a_b_c”) and look at keys

is the “correct” behavior that it returns only
pack(“a_b_c”, “1”)
pack(“a_b_c”, “2”)
or is
pack(“a_b_c_d”, “3”) also considered part of the space

In other words, I’m trying to understand if pack is just a string-concat, or is there if something else going on. This detail is important to me due to trying to understand how range queries work. Thanks.

The last KV won’t be included in the subspace. If you look at the tuple encoding schema, you’ll see why. Strings are terminated with a 2-byte code, so the string “a_b_c” won’t be a prefix of “a_b_c_d”.

If you want “a_b_c_d” to be included, I suggest you encode each piece as a separate string: (”a”, “b”, “c”, …)

Interesting, this saves me a lot of work. So the crux of the construction is

Byte String

Typecode: 0x01 Length: Variable (terminated by [\x00]![\xff])
Encoding: b'\x01' + value.replace(b'\x00', b'\x00\xFF') + b'\x00'
Test case: pack("foo\x00bar") == b'\x01foo\x00\xffbar\x00'
Status: Standard

In other words, byte strings are null terminated with null values occurring in the string escaped in an order-preserving way.

Unicode String

Typecode: 0x02 Length: Variable (terminated by [\x00]![\xff])
Encoding: b'\x02' + value.encode('utf-8').replace(b'\x00', b'\x00\xFF') + b'\x00'
Test case: pack( u"F\u00d4O\u0000bar" ) == b'\x02F\xc3\x94O\x00\xffbar\x00'
Status: Standard

This is the same way that byte strings are encoded, but first, the unicode string is encoded in UTF-8.

In particular, the \x00 and \xFF encodings.

Thanks!