If I understand versionstamps correctly, they are 12 bytes (Java): the first 8 being the transaction version, next two being batch order if FDB determined multiple transactions could be committed under the same version, and the final two bytes being an optional client-assigned order.
Since many systems use 8-byte long values as ids, I’d like to be able to piggyback off of versionstamps to generate an increasing long-sequence without extra coordination/reads, but truncating 10–>8 bytes won’t be safe if the commit was batched together with another.
Maybe I’m wholly barking up the wrong tree, but one option I was wondering about is applying the hidden first_in_batch transaction option to all commits being used to generate 8-byte ids, since I’m imagining two first_in_batch transactions cannot be coalesced. This way, all normal transactions proceed just fine, and id-generating ones are forced to get a 00 batch order. Or maybe I have no idea how this option works
Any other approaches to consider? Something like the high-contention allocator wouldn’t match my exact use case, since the sequence needs to be increasing.
first_in_batch is an interesting suggestion; I will be curious what the core team thinks about it.
How frequent are Id generating transactions, and what is cost of a conflict amongst those? One very simple approach could to maintain a single next_id row and read-use-increment-write it in the same transaction that needs this id generated.
Obvious drawback is that id generating operations will be serialized on this row and may cause conflicts.
Ah, I was hoping you’d show up, as you’ve probably thought more about clever ways to use versionstamps than I. My first reply was the same question you asked anyway, as I think that determines what an acceptable solution will be.
I thought @lihtnestried using FIRST_IN_BATCH while implementing snapshot backup, and found it didn’t work the way he expected. Reading the code for it, it looks like any FIRST_IN_BATCH transaction just causes the current set of batched transactions to be sealed, and a new batch to be created. So I’d think that this would work, but I’d feel way more comfortable adding this invariant check to a test workload and grinding simulation on it for a while before telling you it’s safe. If you use it too much, it would definitely mess with your throughput/latency.
FWIW, there’s only one use of FIRST_IN_BATCH, which appears to be to fix an issue where if a storage server changes its tag mid transaction batch, then any mutation before the tag change would be sent to the wrong storage server. It appears to only exist to serve this one workaround, and that’s why it’s a hidden option.
There are some strategies that can be adapted to allow for “medium concurrency”. For example, if you read the counter at snapshot isolation level, add a random value between 1 and n, and then write the value back using the MAX atomic op. Then guaranteeing uniqueness is a matter of setting appropriate conflict ranges or read/write/modifying an appropriate key. Then the probability of a conflict can be adjusted by choosing different values for n.
Though this may not work depending on the exact semantics of “needs to be increasing” in the original question.
For a little context, I’ve been poking at the fdb-zk layer, and one of the things Zookeeper proper does is assign an increasing global transaction id to writes, where it increments by 1 each time. One place ZK uses this property is for happens-before/after queries, e.g. client asks server: give me all updates to this set of ZNodes > last_seen_txn_id
Moving things over to FDB, I was initially hoping to use versionstamps to get a global write ordering without additional coordination, but I misunderstood the layout (mistook the 2x byte batch order for the 2x byte optional user version), and am now realizing this could assign non-unique ids after truncating down to a long when batch order != 0.
This id generation mode will be valuable to have for 1:1 test compatibility with ZK’s own test suite (which assume the increment-by-1 strategy), but as you mention, will probably cause too many conflicts. For this scenario, any ZNode create/update/delete will require a unique id, so then we’d have set a pretty low upper-bound on write throughput.
There is a “hack” that comes to mind that will let you map 10-byte version-stamp to 8-byte unique value, but this scheme will be valid only for 3,258 days(~9 years), from the moment of first use in a layer instance:
At the time of layer setup, record the current 8-byte commit version value - call it I.
In every ID generating transaction:
VS = <8B commit version(CV) + 2B Batch Id (BI)>
D = CV - I (8-byte) . (D will be equally unique as CV)
ID = (D << 16) | BI (D + BI will be equally unique as CV + BI)
This scheme will work till 6 LSB of the delta (D) are exhausted (upper 2 bytes out of 8 are erased). Under default settings, FDB changes version roughly every 1 micro-sec, so this scheme will survive for 2^48 microseconds.
FIRST_IN_BATCH does work based on my limited testing.
Please see the last comment in the above thread. It was not suitable/work for the exec op case in snapshot project, because the snapshot implementation in TLog parsed only the first mutation and if there were multiple exec op mutations in the same transaction then it was not processing the other exec ops. Hope that clarifies.