Multiple versionstamps in same key/value?

In the Java API, as far as I can tell it’s only possible to write a single incomplete versionstamp in a single key/value write. Is that correct?

I find that constraint very limiting, and would really like the ability to write multiple versionstamps in the same write. Is that likely to ever change?

My usecase is I want to represent entity IDs as versionstamps, and also store relationships between entities. With this limitation it’s not possible to write two new entities, and a relationship in the same transaction.

Yes, that’s right that you can’t have more than one incomplete versionstamp in a single key or value. This limitation is baked into the FDB C API that the Java API is built on top of, so to change this, we’d need to do something like change the way the versionstamp atomic operations work.

For reference, the current scheme for a versionstamped key or value is that when using the SET_VERSIONSTAMPED_KEY or SET_VERSIONSTAMPED_VALUE atomic operations, the last 4 bytes of the key or value are stripped from the bytes given, and then interpreted as a little endian integer. This new integer is then treated as the (zero-indexed) position within the stripped bytes at which the index versionstamp is inserted.

So if you had something like:

\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\0d\x0e\0x0f\x10\x01\x00\x00\x00

Then those last four bytes \x01\x00\x00\x00 get treated as position 1. If the versionstamp is something like \x00\x00\x00\x00\x5c\xa1\xab\x1e\x0f\xdb, then the new key/value becomes:

\x00\x00\x00\x00\x00\x5c\xa1\xab\x1e\x0f\xdb\x0b\x0c\0d\x0e\0x0f\x10

But as you can see, this serialization format only allows one versionstamp to be serialized into it.

To generalize this to more versions, you could imagine we did something like introduce new atomic operations like SET_MULTI_VERSIONSTAMPED_KEY and SET_MULTI_VERSIONSTAMPED_VALUE or perhaps one atomic operation MULTI_VERSIONSTAMPED_SET where the serialization format was something like:

  1. Start with a byte prefix with dummy values for the versionstamps that will become the final key/value (but with the versionstamps filled in)
  2. Encode after the prefix four-byte little Endian integer offsets for each incomplete versionstamp
  3. At the end, suffix the byte array with a single byte containing the number of versionstamps

This has some nice properties like:

  • Vanilla sets are equivalent to this scheme, but with the value \x00 appended at the end
  • Versionstamp operations in the older encoding can be converted to the new encoding by appending \x01 at the end
  • The \x00 suffix can be leveraged to allow the same MULTI_VERSIONSTAMPED_SET operation to be used for versionstamped key and value operations

This scheme is limited to 256 versionstamps in each single key/value, which maybe some use cases would find limiting. Of course, that could be increased by either increasing the number of terminating bytes or by using, say, a variable length encoding scheme.

Adding a new FDB atomic operation is fairly straightforward, though it would require an FDB API change. The standard procedure would be to introduce it in the list of mutations in fdb.options. This exposes it to all of the bindings, and then code to interpret it needs to be added to the server. You’d probably want to also update the Tuple class so that it supports serializing Tuples with multiple versionstamps using the new scheme.


If you didn’t want to do that, you could potentially modify your application’s serialization scheme to support using only incomplete versionstamp. For example, I’m not sure how you’re encoding your entity IDs, but if it’s something like:

(entity 1 versionstamp, entity 1 suffix, entity 2 versionstamp, entity 2 suffix, entity 3 versionstamp, entity 3 suffix)

Then, say, that you want entity 1 and entity 2 to have the same incomplete versionstamp, and you want entity 3 to have some other versionstamp (because, say, it was written in a different transaction). You could do something like:

(shared versionstamp, null, entity 1 suffix, null, entity 2 suffix, entity 3 versionstamp, entity 3 suffix)

Then the null in the entity 1 and entity 2 versionstamp locations indicate that it should use the single shared versionstamp, and entity 3 should use the given versionstamp.

This only works, though, if you don’t care about the ordering (so probably better for values than keys). If you wanted, say, an index on (entity 1 ID, entity 2 ID), then I don’t think there’s a great way that doesn’t require this new SET_MULTI_VERSIONSTAMPED_KEY API, or an additional commit (for example, you could serialize entity 1 and entity 2 in transaction to assign them versionstamps, and then update the index in a separate transaction using only complete versionstamps, read from the database from the result of the first transaction)

1 Like

Thanks. I’d like to use these entity IDs in secondary indexes, so that is a restriction. I think I can work around it using UUIDs for now.

I’m interested in starting working on the API change.

  • Vanilla sets are equivalent to this scheme, but with the value \x00 appended at the end
  • Versionstamp operations in the older encoding can be converted to the new encoding by appending \x01 at the end
  • The \x00 suffix can be leveraged to allow the same MULTI_VERSIONSTAMPED_SET operation to be used for versionstamped key and value operations

I think I follow this logic for setting either keys or values, but I’m not entirely sure how to make it handle keys and values at the same time. Let’s say I want to write a versionstamp to a key and value in the same operation. Would the offset for the first byte in the key start at zero, and the offset for the first byte in the value start at key_length + 1?

This will be my first FDB patch. Are there any development process things I should be aware of?

1 Like