Versionstamp representations

I thought I’d ask before re-inventing the wheel: what do people use to represent and handle versionstamps when they need to be serialized or passed between languages/systems?

The specific use case I’m looking at is a Clojure+ClojureScript application, where the client code running in a browser should receive a versionstamp represented in a way which allows comparisons and passing it back to the server.

There is always the obvious solution, a vector of integer numbers:

  (vs/to-vector v)
  [0 0 1 209 211 241 247 146 0 0 0 0]

But there surely must be a less wasteful representation, that preserves ordering?

One option is to use the byte string representation itself (I suppose this could be thought of as an array of 1-byte integers). This has the required ordering property.

Another way to think about it is to divide it into its parts. The 12-byte versionstamp used in the bindings has three:

  1. An 8-byte integer version
  2. A 2-byte integer batch number (used to distinguish between batches committed at the same version)
  3. A 2-byte user version that is chosen by a user of the API

Note that the internal versionstamps are 10 bytes and omit this last part.

You could serialize these parts separately, though that sort of ends up looking the same as it takes 12 bytes with the same data (the byte string is a big-endian encoding of these integer values).

If you are trying to get by using fewer bytes than 12, you might be able to using some other encoding for the integer components depending on what kind of values you have for them.

I think I’d rather treat the entire versionstamp as an opaque value, without diving into internals. Also, I wasn’t clear: I’m not looking to get by with fewer bytes than 12, just looking for an encoding (like base64) that would have a reasonably small string representation, survive JSON transport, and preserve sort order.

So far I found base32hex and GitHub - dominictarr/d64 — which are both reasonable options, but I thought I’d ask first, perhaps someone solved this problem already.

I use the "@{TRANSACTION_VERSION}-{TRANSACTION_ORDER}" and "@{TRANSACTION_VERSION}-{TRANSACTION_ORDER}#{USER_VERSION}" representations for logging/display purpose, but I’m not really satisfied with it to be honest: I get a lot of "@123456789-1" or @"123456789-1#1", for example, in my log files.

In hindsight, using ‘@’ and ‘#’ was not a good idea for URIs, but at least it makes it easier to visually compare the chronological ordering of multiple version stamps, and you can copy/paste the transaction version part into a transaction log which is sometimes useful.

I also sometimes use the underlying hexadecimal representation of the bytes, but this is not very human-friendly.

Your idea of having a string representation that keeps the ordering could be useful in some situations, and my current format does not allow that, though in practice transaction versions quickly become a very large number that starts with digit 1 or 2 so it kinda sorts almost chronologically after a while…

Maybe an hybrid approach? Most 12-bytes VS have 3 parts, the transaction version (8 bytes), order (2 bytes) and optional user version (2 bytes). Maybe encoding each part individually, and joining them would work better? Displaying multiple versionstamps aligned vertically would keep all 3 segments aligned which is nice.

Using hexadecimal, it would be something like '0123456789ABCDEF-0000' or '0123456789ABCDEF-0000-0001' which kinda looks like a uuid. Is there a standardized 80-bit or 96-bit uuid representation that we could borrow?

We simply use the hex string representation of versionstamp 12-byte array. So the versionstamp

[0 0 1 209 211 241 247 146 0 0 0 0]

becomes 000001d1d3f1f79200000000. This preserves the order, i.e. byte-wise ordering of byte arrays is equivalent to the lexicographic ordering of resulting strings.

We also tried the big-integer representation, but such integers are very hard to compare visually. Also, most versionstamps have leading and trailing zero bytes, which look nice in hex but become random-ish numbers in base 10.

1 Like