`CANCEL` instruction behavior in python bindings

rajivr · December 2, 2021, 12:24pm

I am encountering a very puzzling behavior with python binding’s CANCEL instruction.

The sequence of test instructions was initially generated by binding tester. After bisecting, I’ve narrowed down the number of instructions to 12, so we have a minimal reproducible example. Instructions 0..=3 basically sets up last_version to a good state for the stack machine. The set of instructions that is causing the issue is from 4..=11.

// setup `last_version` to some good value
0.  "NEW_TRANSACTION"
1.  "GET_READ_VERSION"
2.  "COMMIT"
3.  "WAIT_FUTURE"

// minimal instruction sequence
4.  "NEW_TRANSACTION",
5.  "COMMIT",
6.  "WAIT_FUTURE",
7.  "GET_COMMITTED_VERSION",
8.  "RESET",
9.  "SET_READ_VERSION",
10. "CANCEL",
11. "GET_READ_VERSION",

In python after the CANCEL instruction, the GET_READ_VERSION instruction is returning FDB Error 2011 (version_invalid).

Here is the minimal working example that you can run and verify this behavior. I’ve posted the output log here.

As you can see from line 92, we are getting ERROR: FDBError(2011) for GRV.

Here is the output log for Rust. For the same sequence of instructions, Rust binding is generating FDB Error 1025 (transaction_cancelled) as you can see in line 157.

Transaction cancelled actually makes more sense but binding tester will obviously not like it.

I was wondering if there was something happening in python bindings for this behavior and how can I emulate it?

ajbeamon · December 2, 2021, 3:39pm

Looking at both sequences, the instruction in line 7 (GET_COMMITTED_VERSION) is setting last_version to -1. This then gets set as the read version in line 9, and setting the version to a negative number will generate the 2011 error. This error does not get thrown immediately, though. Rather, it is a deferred error that you will see the next time you try to run an asynchronous operation, which in this case is your GET_READ_VERSION call. This error is going to take precedence over the cancelation error you would see from canceling the transaction, so that’s why you see it in Python.

I’m not sure exactly why you aren’t seeing it in Rust, but you could start by running a very simple sequence in each language and see if they differ. Something like the following in Python:

tr.set_read_version(-1)
tr.cancel()
tr.get_read_version().wait()

If these return the same results, then possibly something is wrong with the logic of the tester. If they return different results, then you’d probably need to look somewhere in the bindings to see what’s going on.

rajivr · December 3, 2021, 1:56am

@ajbeamon Thanks a lot for the reply and the above code snippet.

I am now able to create 2011 error in both Rust and Python. I will now try and figure out where the bug is in my tester.

Thanks again!

rajivr · December 3, 2021, 3:27am

Okay. I think I fixed the error.

Turns out that I was running get_read_version inside a function closure and this (function closure) was passed into read method.

rt.read(|rtr| Ok(unsafe { rtr.get_read_version() }.join()?))

read method implements standard retry logic and that was eating up the 2011 error. But on the second iteration of the loop, it got 1025 error, and promptly returned it.

I copy-pasted similar looking code from Go bindings, without realizing ReadTransact in Go does not implement retry logic for certain types.

Topic		Replies	Views
Binding tester SET_READ_VERSION instruction Development bindings	12	505	March 24, 2022
Question regarding retry logic and transaction get_versionstamp API FoundationDB Core bindings	5	643	November 7, 2021
C api add_conflict_range seems not work Development bindings	4	151	May 29, 2024
Binding tester invocation Development bindings	3	438	November 12, 2021
Binding tester heisenbug with API version 710 and tenant Development bindings	20	913	November 22, 2022

`CANCEL` instruction behavior in python bindings

Related topics