Question regarding retry logic and transaction get_versionstamp API

rajivr · November 2, 2021, 12:02pm

For the Rust binding layer that I am currently working on, while writing documentation for the Versionstamp module, I wanted to create an example similar to this example from Java bindings.

Following is the code in Rust. (Full working example is here).

let vs = fdb_database
    .run(|tr| {
        let mut t = Tuple::new();
        t.add_versionstamp(Versionstamp::incomplete(0));
        tr.mutate(
            MutationType::SetVersionstampedKey,
            (t.pack_with_versionstamp(Bytes::from_static(&b"prefix"[..])))
                .unwrap()
                .into(),
            Bytes::from_static(&b""[..]),
        );

	    let vs_fut = tr.get_versionstamp();

        tr.commit().join()?;

	    let vs = vs_fut.join()?;

	    Ok(vs)
    })
    .unwrap_or_else(|err| panic!("Error occurred during `run`: {:?}", err));

As I was looking at this code, I realized if the commit future succeeds and if the get_versionstamp future fails, there is a possibility that .run method could retry the closure, and commit twice.

Does the standard retry logic using on_error ensure that this does not happen?

Asking the same question in a slightly different way - Can we assume that once tr.commit() succeeds, vs_fut will also return the transaction versionstamp without generating an error?

alloc · November 3, 2021, 9:11pm

Hm, the get_versionstamp future is somewhat weird, in that it must be called before the transaction commits, but it will not be completed until after (essentially, being filled in with information from the commit response message).

The way the standard retry works, it will call commit for you, essentially being equivalent to:

def run(lambda):
    tr = db.create_transaction()
    while True:
        try:
            result = lambda.apply(tr)
            tr.commit()
            return result
        except Err as e:
            tr = tr.on_error(e)

With on_error handling here decisions like whether to retry (or if it’s hit a limit) or whether the error is retryable.

So, in the example for the Java bindings, the lambda returns the result of getVersionstamp but doesn’t try and commit, and so the transaction will only be committed once, and it will always have been committed if .run returns a successful result.

That’s right…for the most part. Once tr.commit succeeds (or maybe more precisely “once the transaction that created the versionstamp future commits”) , it will fill in vs_fut with the versionstamp from the commit response. This means that the only way that the future fails is if something weird happens, e.g., the client is closed between when the commit succeeds and vs_fut is completed, and then vs_fut returns a “cancelled” exception or something.

rajivr · November 4, 2021, 1:18am

@alloc Thanks for the reply.

alloc:

The way the standard retry works, it will call commit for you, essentially being equivalent to:
def run(lambda):
    tr = db.create_transaction()
    while True:
        try:
            result = lambda.apply(tr)
            tr.commit()
            return result
        except Err as e:
            tr = tr.on_error(e)
With on_error handling here decisions like whether to retry (or if it’s hit a limit) or whether the error is retryable.

In the Rust bindings .run method is implemented in a very similar way here.

That’s correct! I have not actively programmed in Java in a very very long time, but as I looked at the following code from JavaDoc

  CompletableFuture<byte[]> trVersionFuture = db.run((Transaction tr) -> {
       // The incomplete Versionstamp will be overwritten with tr's version information when committed.
       Tuple t = Tuple.from("prefix", Versionstamp.incomplete());
       tr.mutate(MutationType.SET_VERSIONSTAMPED_KEY, t.packWithVersionstamp(), new byte[0]);
       return tr.getVersionstamp();
   });

   byte[] trVersion = trVersionFuture.get();

I realized that when trVersionFuture value gets returned from db.run, the Transaction tr value passed to the function by db.run would still be live and GC’ed later.

Hence the .run() method can call the commit once.

However, in case of Rust, we do not have a GC, and in the API design we tie the lifetime of all FDB futures to the lifetime of the transaction (i.e., lifetime of tr argument in the above example).

Because of this, I had to call commit twice. Once inside the closure and another time from inside the .run method. Luckily calling commit twice did not generate an error, otherwise I would have had to create a separate method for versionstamp (something like run_and_get_versionstamp()).

I have not yet started with developing FDB layers, but how common is it to return versionstamp from db.run methods in something like Record Layer?

A good way of handling this case in Rust would be

Introduce a run_and_get_versionstamp method, that would return a Rust tuple containing the result of the Rust closure, and the versionstamp.
Make get_versionstamp method unsafe, so the caller is aware of this issue.

Please correct me if I am wrong, I think we need to worry about two error conditions here.

The one generated from commit (after retrying fails), in which case, the versionstamp would be irrelevant.
The commit succeeds, but versionstamp future fails. In which case, the caller should be careful in retrying the transaction, in case she is depending on versionstamp to maintain invariants.

Also are there any other APIs similar to get_versionstamp that would need additional care?

alloc · November 5, 2021, 12:16am

Hm, interesting. I think that tying a future to its associated transaction should work, though it’s maybe a bit dicey. The thing that makes getVersionstamp somewhat unique is that it’s one of the few times where the user is trying to return the future from within the lambda instead of the value returned by a future. Which definitely makes code that has to think about lifetimes a bit tricky.

It’s possible that getVersionstamp might need to have a different lifetime than the transaction’s lifetime in order to make it work naturally (i.e., without a double commit, which may work, but I believe a second commit to a single transaction is discouraged–and is perhaps undefined behavior). Though I’m not super familiar with how Rust handles lifetimes, so I’m not sure how feasible that would be.

Well, that’s a bit hard to say. In practice, the Record Layer always collects the versionstamp from a transaction and stores it locally. You can see that in the wrapper of the commit method: https://github.com/FoundationDB/fdb-record-layer/blob/233a72a51182a1f35b9bad34ac29b127a437a12b/fdb-record-layer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/FDBRecordContext.java#L437

The value, however, is typically only used to set some state on the transaction-wrapper, and so clients who want to inspect the versionstamp essentially only have to commit() their transaction and then when that completes, look at the value of getVersionStamp. My guess is that that is not all that frequently done, with the main use case being around versionstamp operations (where the value returned by the transaction can be used to know what “version” was written into the data). I guess that’s sort of equivalent to the run_and_get_versionstamp method your describing, insofar as the user has an alternate means of getting the versionstamp than by calling the method themselves.

Yes, that’s right. In that case, I believe the future can be ignored, though I’m not 100% certain what it is completed to (if anything).

Yes, that’s the other case. In most of the use cases where I think someone might want to look at the versionstamp, it’s mainly to introspect data that they just wrote (e.g., if they wrote a versionstamped key or value, they might want to use the return value from getVersionstamp to know what exact key or value was written). If the versionstamp future fails, then I think they can treat this the same way they’d need to treat a commit_unknown_result, i.e., cases where a commit might or might not have failed (e.g., because the database connection failed between when the transaction was submitted and when the response was issued to the client). In either the case of a getVersionstamp failure or a commit_unknown_result, the caller doesn’t know exactly what data was written, so if their operation isn’t idempotent, they might need to issue some kind of query to see if the data is there before retrying (or something like that). But it does get a little hairy and a little complicated for users to follow.

I kind of wish that instead the commit method returned a commit_response message or something with the versionstamp (and the commit version and perhaps other data), and then the user could ask for and maybe introspect the commit response, but that’s not quite how the API is set up.

rajivr · November 5, 2021, 1:21am

Thanks again @alloc for the detailed reply!

Treating the get_versionstamp future failure similar to commit_unknown_result, would simplify the API design. I’ll go ahead and introduce run_and_get_versionstamp method and make get_versionstamp method unsafe.

Btw, I still don’t have a good idea on how to correctly use commit_unknown_result for layer development. Its one of the places where I am very likely going to come back and ask more questions in the the future!

rajivr · November 7, 2021, 5:37am

@alloc I explored this further. It turns out that it was possible to safely return the Transaction value from .run method.

You can ignore the Rust type system noise, but here is how such an interface would look. (Working code is here).

    fn run_and_get_transaction<T, F, Tr>(
        &self,
        f: F,
    ) -> FdbResult<(T, Box<dyn Transaction<Database = Self::Database>>)>
    where
        Self: Sized,
        F: Fn(&dyn Transaction<Database = Self::Database>) -> FdbResult<T>;

The issue however, is that even after returning the Transaction value, it seems like for get_versionstamp() to work correctly, the get_versionstamp() future must be created before calling commit() on the transaction.

Because of this constraint, it seems like we really need a run_and_get_versionstamp method.

I also ended up keeping run_and_get_transaction method, as its needed to make get_committed_version work correctly.

This thread has been very helpful. As I was working on this issue, I also discovered a subtle bug in my earlier .run() implementation, which I have now fixed. Hopefully the binding tester will weed out any remaining hidden bugs.

Thanks again for taking time to provide detailed reply.

Here is the final code that started this thread!

Topic		Replies	Views
Proper transaction retry loops Using FoundationDB	3	548	April 29, 2020
Use case of versionstamp and behavior of pack_with_versionstamp Using FoundationDB bindings	6	2386	November 26, 2018
Node bindings release 0.8.0, now with Versionstamp support in the tuple layer! Using FoundationDB	13	760	January 29, 2019
Golang Client Error Wrapping Convention Using FoundationDB bindings	4	567	November 22, 2019
Why does get versionstamp() return a future that resolves with 0 bytes? Development bindings	5	537	August 21, 2020

Question regarding retry logic and transaction get_versionstamp API

Related topics