How does the multi-version client handle transactions?

danm · November 25, 2024, 3:26pm

The main mention of the multi-version client seems to be in the docs here.

I’m trying to understand what exactly happens when. We’ve got various systems that are currently loaded with 2 different FDB versions (7.1 and 7.3), and every now and again we’ll see a cluster_version_changed error logged.

We mostly use .read and .run, but occasionally for various reasons we manually create a transaction and choose when to complete it (We’re using Java bindings to call FDB from Clojure, which has a lot of laziness. So in some places we were having issues with dropping out of the ‘scope’ of the read/run before we’d actually made use of the returned data, so before the tx had actually been executed).

So my first question is, is it possible to see cluster_version_changed returned as the final state from .read/.run, or is it always retried until some other state is returned there, and all our instances of it will be where we’re manually creating and finishing txes?

Secondly, how and when does the client decide which network thread is ‘correct’? We’re calling .open to get a connection right as our app starts up, but we might not then create a tx and try to .get or .set until the first external request hits the JVM. My initial reading was (and our initial testing suggested) that every tx was sent to all version threads in parallel on the expectation that one would succeed and the others would throw, so if we were setting some external in-memory state within the tx, before the first get/set, we would see that updated twice instead of only once.

But maybe that’s only on the first tx after .open, and subsequent txes only use the one correct thread until cluster_version_changed is thrown again? Or is there some periodic time after which it’ll retry all threads? Or does it pick a random one of the threads first and run it through that, and what read/run do if they get cluster_version_changed is try another, so they’re never running a request for multiple versions in parallel? If they are running in parallel I’m trying to understand what determines which result you get back? I’d expect the failure-case to return first nearly all the time because it has the least to do (fail on first get/set), so does it only return that if the ‘correct’ version thread times out or similar? Or will it nearly always be the case that the transaction runs twice, once in the correct thread initially, and once as a retry that knows which version to use when the incorrect thread returns cluster_version_changed?

We’re trying to implement idempotency using something akin to the atomic_idempotency function from here and understand how that interacts with the multi-version client (given that the inbuilt feature explicitly doesn’t support it and isn’t prod-ready yet) and if there’s any pitfalls there where we could get caught out.

In a lot of our code this is a non-issue, because the tx removes an item from a queue and processes it, adding more data to the DB as a result. So if there is any sort of conflict or error and retry, then if the transaction did actually succeed anywhere then the item will no longer be on the queue and something else will be pulled for processing. This is only a problem we are looking at in specific edge-cases where we are having to add idempotency on top of transactions that do not inherently have it ‘inbuilt’.

At the moment our assumptions are that:

It is not possible for a network thread of the wrong version to get ‘past’ an FDB get/set call in a tx, it will always throw a cluster_version_changed error at that point at the latest.
When using the multi-version client, it’s more likely that a transaction will be run against the cluster multiple times (first in parallel with ‘bad’ versions, then on the known-good version).

alloc · November 26, 2024, 4:28pm

The .read and and .run methods on Database both wrap retry loops, so you generally shouldn’t see them return cluster_version_changed unless you set a retry limit via a transaction or database option. One other caveat there is that if you wrap the error in some error class of your own control, then those loops won’t retry the wrapped error even if the cause is a retryable FDB error.

danm:

Secondly, how and when does the client decide which network thread is ‘correct’? We’re calling .open to get a connection right as our app starts up, but we might not then create a tx and try to .get or .set until the first external request hits the JVM. My initial reading was (and our initial testing suggested) that every tx was sent to all version threads in parallel on the expectation that one would succeed and the others would throw, so if we were setting some external in-memory state within the tx, before the first get/set, we would see that updated twice instead of only once.

But maybe that’s only on the first tx after .open , and subsequent txes only use the one correct thread until cluster_version_changed is thrown again? Or is there some periodic time after which it’ll retry all threads? Or does it pick a random one of the threads first and run it through that, and what read/run do if they get cluster_version_changed is try another, so they’re never running a request for multiple versions in parallel?

The requests shouldn’t be scheduled in parallel, especially not in a way that is noticeable to the JVM. All of the multi-version logic is in the FDB C library that the Java bindings wrap. I believe it’s more-or-less what you suggested at the end: a single version is initially chosen, and then we stick with that version until an operation fails due to a client/server versioning error (which the client and server can determine because each message passed between the two is encoded with a “protocol version” which must match).

I think that documentation might be a little out of date, because the multi-version client is prod ready and the retry loop it uses should work for cluster_version_changed. One thing that may be of note: in the retry loop, if you see that the idempotency ID is not set, that’s not enough to guarantee that the transaction wasn’t committed: it’s possible that the commit of the first transaction is effectively in progress and the second transaction was just assigned a read version from before the transaction was committed. However, once the transaction has committed successfully, then you can be sure of what has happened.

Here’s the basic scenario (and also what can happen if there’s a transaction time out):

Transaction 1 is created
Work is done for transaction 1
The commit for transaction 1 is submitted
An error happens before the transaction commit state is known (cluster_version_changed, timeout etc.)
Transaction 2 is created. All reads done will be from a snapshot view of the database here
Transaction 1 commit completes
Transaction 2 sees that the idempotency key is not set and so re-runs the work done in step 2
Transaction 2 commits, but the commit is failed to due to a conflict on the idempotency key
Transaction 3 can be created that sees that transaction 1 was successfully created

Note that you could get the exact same behavior if during the retry loop, rather than waiting for the commit response, you just created a new transaction, which is effectively what happens when you retry on a timeout. This is relevant for cluster_version_changed because the process of switching from one client to another ends up giving up on all currently outstanding transactions to a given cluster, including ones that are in the middle of committing, even if it’s not that transaction that actually failed. That being said, assuming any actual cluster upgrades are one way (and even if there are downgrades), cases where a transaction commits late like this should be rare to never with cluster_version_changed, though this remains a real problem for retrying timeouts.

There are some cases where this can be a problem. If you operation has any side effects, you have to be careful not to rely on the idempotency ID being empty being seen as signal that those side effects did not happen, though you can rely on its presence to let you know if they have already happened.

The set operations are actually done entirely locally and don’t requiring talking to the cluster. A get call does have to talk to the cluster, and so it will throw cluster_version_changed if the chosen client isn’t correct. However, if there is an actual upgrade happening simultaneously with the transaction, then the cluster version can change on any call that actually talks to the database.

I think that is true, except for the “in parallel” part. Maybe put another way, using the multi-version client can result in retryable errors in more cases than when you don’t use it.

danm · November 27, 2024, 4:40pm

Exactly what we needed to know! Thank you very much

Topic		Replies	Views
Transaction Too Old error when using a common read version with multiple client connections Using Layers	5	1155	December 9, 2020
Understanding inter communication Using FoundationDB	11	3588	October 11, 2018
How to setup multi-version clients with Java? Using FoundationDB	4	1813	July 31, 2019
Set primary data center does not work as expected Using FoundationDB	13	786	June 18, 2019
Consistency guarantees in case of reusing committed versions Using FoundationDB	6	729	March 26, 2019

How does the multi-version client handle transactions?

Related topics