FoundationDB

Application design using Subspace and Tuple


(Viacheslav Biriukov) #1

Hi,

I’m a bit confused about proper design for application using Subspaces and Tuples.

I want to create a bunch of indexes for arbitrary number of tables. For example I want to have:

/table/{{name1}}/index/{{key1}}/size
/table/{{name1}}/index/{{key1}}/price
/table/{{name2}}/index/{{key2}}/size
/table/{{name2}}/index/{{key2}}/price

where {{name}} – is a table name, and {{key}} is a name of a primary key in table for a record.

I see 4 following variants to design this:

dir, _ := directory.CreateOrOpen(db, []string{"index"}, nil)

sub1 := dir.Sub(tuple.Tuple{"table", name, "index"})
tr.Set(sub1.Pack(tuple.Tuple{"price"}), price)
// Key: `\x15\x12\x05\x02table\x00\x02name\x00\x02index\x00\x00\x02price\x00``

sub2 := dir.Sub(tuple.Tuple{"table", name, "index", "price"})
tr.Set(sub2, price)
// Key: `\x15\x12\x05\x02table\x00\x02name\x00\x02index\x00\x02price\x00\x00`

sub3 := dir.Sub("table", name, "index")
tr.Set(sub3.Pack(tuple.Tuple{"price"}), price)
// Key: `\x15\x12\x02table\x00\x02name\x00\x02index\x00\x02price\x00`

sub4 := dir.Sub("table", name, "index", "price")
tr.Set(sub4, price)
// Key: `\x15\x12\x02table\x00\x02name\x00\x02index\x00\x02price\x00`

I can see the difference in the key generation outcome, but can’t understand which case would be the best and what drawbacks could be with each of them.

Also I’m a bit confused about first and second examples with two \x00 – one at the end and one in the middle. Don’t understand this logic and why it is needed here.

Thank you.


(Christophe Chevalier) #2

Your code is opening a Directory called “/index”, but then it creates keys that look like (..., ("table", name, "index"), "price")

I think you should open a Directory with a path composed of { "table", name1, "index" } so that you get a very short key prefix per index per table. Then the rest of the key will be a simple tuple (key1, "size").

In you example, the first 2 bytes \x15\x12 corresponds to the prefix allocated to your directory “index”, the rest is the encoded tuple that your are building in fragments, using an intermediate subspace (which does not serve any purpose here).

The double \x00 in the middle come from the embedded tuple ("table", name, "price"): one for the terminator of the string, and the second for the terminator of the embedded tuple. The other samples are variants of the same issue (tuples inside tuples). The \x05 byte is the marker of the start of an embedded tuple.

If you change your code to open a Directory using multiple subfolders, get rid of the dir.Sub(...) and pack tuples with only (key, "size") or (key, "price"), you should get something that looks a lot nicer.

And then, if you replace the strings “size”/“price” by some integer constant, you can compress it even more (at the cost of a little less readability).

To sum up: you should probably create a Directory entry in the directory layer per index per table (if you). Then if you keys are very simple, you can simply tack a simple tuple on top of that without needed any intermediate subspace.

Embedded tuples can be usefull if you want to build coumpound index with a constant size (ex: decomposing a date into a sub-tuple (year, month, day, …))

My go-foo is limited, but if you try something like this, it should produce better results:

dir, _ := directory.CreateOrOpen(db, []string{"table", name, "index"}, nil)
tr.Set(dir.Pack(tuple.Tuple{key1, "price"}), price)
tr.Set(dir.Pack(tuple.Tuple{key1, "size"}), size)
...
tr.Set(dir.Pack(tuple.Tuple{key1, "xyz"}), xyz)

(Viacheslav Biriukov) #3

Thank you for answer!

The double \x00 in the middle come from the embedded tuple (“table”, name, “price”): one for the terminator of the string, and the second for the terminator of the embedded tuple.

I see, but don’t understand why we need this explicit emphasising of a tuple with additional symbols (perhaps it’s out of my question and I’m missing something about this compound index logic with a constant size).

And then, if you replace the strings “size”/“price” by some integer constant, you can compress it even more (at the cost of a little less readability).

Yes, I though about that, but it looks like we save a little space by decreasing readability. So I prefer to have stings here.

But back to design question. What if I also want to have additional layers for values per table (for example with chunks to save values bigger than 100KB). So a new full pseudo schema:

# from above
/table/{{name1}}/index/{{key1}}/size
/table/{{name1}}/index/{{key1}}/price
/table/{{name2}}/index/{{key2}}/size
/table/{{name2}}/index/{{key2}}/price

# additional schema for values with chunks
/table/{{name1}}/values/{{key1}}/0
/table/{{name1}}/values/{{key1}}/1
/table/{{name1}}/values/{{key2}}/0
/table/{{name2}}/values/{{key3}}/0
/table/{{name2}}/values/{{key4}}/0
/table/{{name2}}/values/{{key4}}/1
/table/{{name2}}/values/{{key4}}/2

So in this case, should I use following design:

dirTableName1, _ := directory.CreateOrOpen(db, []string{"table", name1}, nil)

dirName1Index, _ := directory.CreateOrOpen(dirTableName1, []string{"index"}, nil)
tr.Set(dirIndex.Pack(tuple.Tuple{key1, "price"}), price)
tr.Set(dirIndex.Pack(tuple.Tuple{key1, "size"}), size)
tr.Set(dirIndex.Pack(tuple.Tuple{key1, "xyz"}), xyz)

...

dirName1Values, _ := directory.CreateOrOpen(dirTableName1, []string{"values"}, nil)

subKey1 := dirName1Values.Sub(key1)
for ... {
    ...
    tr.Set(subKey1.Pack(tuple.Tuple{i}), chunk)
}

subKey2 := dirName1Values.Sub(key2)
for ... {
    ...
    tr.Set(subKey2.Pack(tuple.Tuple{i}), chunk)
}

… or it’s time to use Subspaces for dirName1Values and dirName1Index? Or Pack using Tuples?

Thank you!


(Christophe Chevalier) #4

I think the “issue” is that dir.Sub(arg1, arg2, ..., argN) creates a tuple (arg1, arg2, .... argN) and then encodes that tuple into binary. If you are already passing a tuple as the first argument to dir.Sub(...) you are in essence creating a tuple with a single element that is itself a tuple. In your last examples, you were passing multiple arguments to Sub(…) and it produced the expected result.

You are having the same issues I had a few years ago, and this is because of a change that happened to the design of tuples, which made things ambiguous with the API at the time.

I think this is a common design flaw which is present in most bindings, and everyone will probably stumble on this when learning about layers.

Story time!

For the first version of directories and subspace, embedded tuples were not supported, and in practice, using them would flatten all the items: so passing a key like (1, (2, 3), 4) would become (1, 2, 3, 4) once encoded or decoded. This was hiding the bug in the code, and everyone took a dependence on this “behavior” as if it was normal.
When embedded tuples were introduced, this immediately blew with the same result that you are seeing. For example, I had to do a breaking change in the .NET binding API, because I was not able to fix this issue due to overload resolution ordering issues between interfaces and generics in C#. My only solution was to change the name of the method that takes items to distinguish it with the method that takes tuples.

In the case of the .NET binding, the convention is now that methods like EncodeKey(T1 arg1, T2 arg2, ..., TN argN) will create a tuple with all the arguments, while methods Pack((T1, T2, ..., TN) tuple) will take a single argument that is the already-created tuple. I think that dir.Sub(..., ..., ...) in the go binding corresponds to my EncodeKey(x1, x2, ...) while you thought that it was Pack(tuple)

To illustrate with the .NET binding:

  • EncodeKey(1, 2, 3, 4) is the equivalent of packing (1, 2, 3, 4) of length = 4.
  • Pack((1, 2, 3, 4)) is the equivalent of packing (1, 2, 3, 4) of length = 4,
  • EncodeKey((1, 2, 3, 4)) is the equivalent of packing ( (1, 2, 3, 4), ) of length = 1,

Subspace, Encodings and Type Systems

I had to do a lot of work in the .NET binding to solve the issue of encoding and decoding keys, so that the application layer does not need to think about all of that, and also to prevent easy mistakes (like you found out the hard way).

I had a few discussions with @dave at the time about a “Type System” that could help Layer implementors deal with this. I implemented a few of the ideas in the .NET binding, in the form of IKeyEncoding / IKeyEncoder<T>, and other variants (IDynamicKeyEncoder, IKeyEncoder<T1, T2, ...> and so on).

These types handle all the business of encoding keys (single field or composite) into binary and back. They offer different guarantees: On implementations uses the Tuple Encoding and is used by default (for compatibility with the other bindings). Other implementations can use something like protobuf (more compact, but no ordering guarantees, etc…).

You can then combine a Subspace (a pre-computed binary prefix) and a Key Encoder to create a “Key Space” that does both: encode logical keys using some encoding scheme, and prepend the binary suffix automatically.

Some KeySpaces are dynamically typed, like DynamicKeySubspace while others are statically typed like TypedKeySubspace<string, int, Uuid128>, and the layer code can decide which one to use depending on circumstances, personal choice or risk tolerance :slight_smile:

I don’t think that other bindings went that far, so you are probably stuck at the basic level of combining binary prefix (subspaces) with packed tuples yourself.


(Christophe Chevalier) #5

Yes, the typical solution is to add an additional integer after your keys, with 0, 1, … for each chunk.

Now, instead of doing a get(key) to read a value, you do a get_range(KeyRange.startsWith(key)) and glue back the chunks into a single buffer, and then parse the value. Likewise, use clear_range instead of clear to remove them.

For values that are less than 100 KB, I suggest still adding a chunk index of 0, so that you don’t have multiple code path to maintain (ie: return arrays of values, of length 0 if not found, 1 if small value, >= 2 if large value, etc…).

See my answer above regarding type systems and encoding.

Subspaces and Tuples, currently, are just temporary variables that are use to skip recomputing the same binary prefix over and over again.

If all the keys for a single document will look like (...., "FOO", "BAR", "KEY123", {FIELD} ) and you have dozens of fields, it can be faster to create a subspace for (...., "FOO", "BAR", "KEY123") to pre-compute that part of the key, and then use the subspace to only append the field name for each set. Internally, it will only encode the field, and concat the bytes with the subspace prefix. Less work to do.

Note: a “subspace” is just an object that wraps an already computed binary prefix for your keys. If you create a subspace and only use it once or twice, this is a bit wasteful. You should only see it as a cached temporary variable, and use it for performance reason.

A million of milliseconds is a lot of time :slight_smile:

I prefer solving the readability issue with the key encoders and type system in the code. I usually create one or more typed encoder in my layers, by specifying the exact number of fields and their types. That way, I get full intellisense in my code, and don’t mix fields by mistake. It is common to have keys that are (string, string, string, int, int) which are a nightmare for me.

With typed encoders, I reduce the number of errors, and they can also be used to stringify the keys into a human-friendly string (for logs, debugger, etc…).

This is perhaps because I’m using a statically typed language and want type safety. A programmer that uses dynamic languages would probably prefer having a dynamic key encoder (like tuple currently is) but this looks too scary to me :cold_sweat: I like my compiler errors when I mess up the ordering of things :slight_smile:


(Viacheslav Biriukov) #6

Thank you, for such detailed answer and history overview. =)

Good point, got it.

Now it’s cleaner to me how to choose and build keys layout.

I think I’ll start with a simple subspaces logic. Will reuse pre-generated subspaces for saving computation time, as you mentioned above.

Good point, also. Encoder is a good thing to have here.


(Pontus Lundin) #7

Hi!

How do i get a Range of a “nested” Dir ? (as per your example)

UserScoreDir, err := directory.CreateOrOpen(db, []string{“user”,“score”}, nil)

rr := tr.GetRange(fdb.KeyRange{UserScoreDir.Pack(tuple.Tuple{""}), UserScoreDir.Pack(tuple.Tuple{0xFF})}, fdb.RangeOptions{})
ri := rr.Iterator()

I always end up with:
the database has keys stored at the prefix chosen by the automatic prefix allocator: []

Has it automatically allocated a prefix for the path that i need to specify ?
Thanks!


(Christophe Chevalier) #8

The Directory Layer allocates a random prefix the first time the directory is created, and nested directories will have a completely different prefix from their parent directory (just like on a file system, folders are hierarchical but on the disk their are allocated on any sector that was free at the time).

In your sample, I think the confusion comes from the fact that you are using the tuple encoding instead of binary encoding for the keys: tuple.Tuple("") will encode to '02 00' (basically, the empty string), and tuple.Tuple(0xFF) will encode to '15 FF' (the integer 255). So you are only reading the range of (xxx.02.00)..(xxx.15.FF), basically saying for all the keys from the empty string to the integer 254 which does not make much sense.

If you want to read the entire content of the directory, there should be a method on the directory instance that is named something like ToRange() (will depend on the language and binding you are using) which should return a key range. It should automatically add the 0xFF at the end, so you don’t need to do it yourself.

For directories that use the tuple encoding, usually if the prefix is ‘abc’, if will return a range 'abc' <= k < 'abc\xFF', because the tuple encoding never produces keys starting with 0xFF.

But if you are using a different encoding, then you should use 'abc' <= k < 'abd' which increments the last byte.

Most language and bindings have a type named something like KeyRange with various methods like “startsWith” of “prefixedBy” that should do this for you.


(Pontus Lundin) #9

Many thanks Christophe your FDB insights, very valuable so thanks for sharing =)
For anyone using Go bindings i ended up using this method
s, e := UserScoreDir.FDBRangeKeys() //start and end key
within
rr := tr.GetRange(fdb.KeyRange{s, e}, fdb.RangeOptions{})

Also, my DB was a mess i had tried various setters on various paths and prefix. Wiped it all it all out while another processed was using it…it made it more or less corrupt :slight_smile: Needles to say i had to uninstall and install it again and now it works as expected…but hey it is a local dev machine so lesson learned =)

Thanks!