I’m currently adding support to VersionStamps in the .NET Binding.
I’m a bit confused by some discrepencies between the documentation, code implementation, and behaviour I’m seeing. Also, I was not able to find a lot of documentation on this subject, and no concrete example.
10 bytes or 12 bytes?
UPDATED from answers below.
There are two flavors of Versionstamps: 80-bits and 96-bits long. The former is what the database understand, and the later is a client-side convention.
- The 80-bits versionstamps are 10 bytes longs, and are composed of 8 bytes (Transaction Version) followed by 2 bytes (Transaction Batch Order). They are ordered and guaranteed to be unique per transaction. They are handled by the
FDB_MUTATION_TYPE_SET_VERSIONSTAMPED_KEY
andFDB_MUTATION_TYPE_SET_VERSIONSTAMPED_VALUE
. - The 96-bits versionstamps are actually an 80-bit versionstamp followed by 2 extra bytes called User Version, so 12 bytes in total. These two bytes can be used if a transaction wants to insert more than one key. These two bytes are not seen by the database, and are just a convention at the binding level.
Both Python and Java bindings seem to have take the route of only exposing 96-bits versionstamps at the API level, and use a default User Version of 0. So these versionstamps will always be 12 bytes. The shorter 10-bytes ones are not exposed.
When the application wants to create a key using a versionstamp, it does not know the actual value yet, so the pattern is to have a placehold stamp, which then gets overwritten when setting a versionstamped key or value at commit time. These stamps are called incomplete stamps in some bindings.
The binding tracks the offset in the binary key or value where this placeholder stamp is located, and pass this info to the database, who then replace the bytes with the actual stamp. After the transaction has committed, the application can guery the actual stamp used via fdb_transaction_get_versionstamp
.
So for example, if your layer uses keys like ('foo', <stamp>)
, using the Tuple Layer, the serialized binary key would look something like this:
('foo', <placeholder_stamp>) => < 02 'foo' 00 33 xx xx xx xx xx xx xx xx xx xx 00 00 >
note: currently, bindings will use xx = FF as placeholders but this could be anything
The prefix 02 'foo' 00
corresponds to the encoding of the 'foo'
string, the byte 33
is the type header for 96-bits version stamps in the Tuple Encoding and is not part of the stamp. The 10 ‘xx’ are placeholders for where the actual stamp will be placed, and the last 00 00
are the user version (0 by default).
When calling the SetVersionstampKey
method, you need to pass an additional value which is the offset in the key where the stamp is located. This is done by adding 2 extra bytes at the end, containing the offset in little-endian. These 2 bytes are not actually part of the key, and will be removed by the SetVersionstampedKey
method.
Since the location of the stamp in our example above is at offset 6, the actual byte array passed to the SetVersionstampedKey
method will be:
tr.SetVersionstampedKey( < 02 'foo' 00 33 xx xx xx xx xx xx xx xx xx xx 00 00 06 00>, 'hello world')
At commit time, the last two bytes are removed, and the 10 bytes are the specified offset are filled by the database. If the transaction commits at version 0x0123456789ABCDEF
with batch order 0x1234
, the key will become:
< 02 'foo' 00 33 01 23 45 67 89 AB CD EF 12 34 00 00> = 'hello world'
In practice, the batch order will usually be 0 or a low number (depends on the number of concurrent transactions).
The first 10 bytes are controlled by the database, and the last two bytes (‘00 00’) are controlled by the user,
Original question:
Custom Serialization Required
The way Java and Python deal with Versionstamps and tuples is a little bit … weird. I’m not a big fan of having to need a custom method to build tuples that contain a versionstamp (due to the need to get the byte offset where it starts). This does not seem to play well with other serialization mechanics (like for ex combined with subspace prefixes, or other custom encodings).
I was wondering if another approach would be better: using a specifc byte pattern, that is used to mark the location of a stamp (client side), and when such a byte array is passed to the VersionStampKey mutation, it would look for this pattern, and obtain the position that way. => no need for special code paths, any binary encoding scheme can simply output this pattern anywhere it wants, and it will be recognize at the last step.
Obvious problem is what if this pattern is used by random chance by the key itself? It cannot be something trivial like all zeros, or all FF. Maybe $VERSTAMP!$
or something like that?
There is precendents with - for example - the multipart content encoding (RFC1341(MIME) : 7 The Multipart content type) which explicitely define what is the expected chunk separator. Most implementations may choose a constant (or random) separator, and check if it is not contained in the message itself. If it is, choose another marker.
We could maybe choose to have a default token to mark the spot where a VersionStamp is, but have a mechanism (somewhere on the transaction? or as an extra paramter to the VersionStampKey helper method) to specify what was the exact token used.
// use default token
var key = AcmeLib.SerializeKey(("foo", 123, VersionStamp.Incomplete(42), 456)); // uses default token
// -> <'foo',123,$VERSTAMP!$42,456>
tr.VersionStampKey(key, ....);
// risk of collision
var key = AcmeLib.SerializeKey(("foo", 123, "Oh no, I have a $VERSTAMP!$ inlined", VersionStamp.Incomplete(42), 456), token: "ABCDEFGHIJ");
// -> <'foo',123,'Oh no, I have a $VERSTAMP!$ inlined',ABCDEFGHIJ42,456>
tr.VersionStampKey(key, ..., token: "ABCDEFGHIJ");
We could even decide to generate a random token per new transaction, and ensure that it does not happen twice in the same key. If it does, then the transaction would fail, retry (with a NEW random token), and the probability that the next token would be also contained in another key of the same transaction would be very low.
db.Run((tr) =>
{
var token = tr.GetStampToken(); // -> "Aoew!4='£K"
var key = AcmeLib.Serialize(("foo", 123, VersionStamp.Incomplete(123), 456), token);
// -> < 'foo',123,Aoew!4='£K42,456 >
tr.VersionStampKey(key, ....);
});
fdb_transaction_get_versionstamp
UPDATED
This method can be used to obtain the actual value that the database will insert into the key (or value) instead of the temporary placeholder. This value will be an 80-bit value that is the same for the whole transaction. If the transaction needed multiple ids, the way is to use an 96-bit timespan, with the last 16 bits being a user-provided integer.
This method must be called BEFORE the call to fdb_transaction_commit
and the Future will be resolved AFTER the transaction commits successfully (or fail).
An exemple of Java code:
This may have some impact on code that use async operations (.NET, Typescript with async/await, Java with CompletableFutures, etc…) especially when combined with retry loops (where the code does not manage the transaction itself, and in particular is not the one who invoke the commit method).
Example of patterns that will fails:
Some possible solutions for Java:
Original question: