Proper transaction retry loops

davisp · April 28, 2020, 9:37pm

Howdy,

I’ve finally gotten around to playing with the new client buggify feature and its managed to turn up an issue I’m not immediately 100% sure on how to address.

In the Erlang bindings we have a loop [1] that looks like such:

do_transaction(?IS_TX = Tx, UserFun) ->
    try
        Ret = UserFun(Tx),
        case is_read_only(Tx) andalso not has_watches(Tx) of
            true -> ok;
            false -> wait(commit(Tx))
        end,
        Ret
    catch error:{erlfdb_error, Code} ->
        put(?ERLFDB_ERROR, Code),
        wait(on_error(Tx, Code)),
        do_transaction(Tx, UserFun)
    end.

This was based off the similar retry loop in the Python bindings [2]. The interesting issue that buggify has turned up is the wait(on_error(Tx, Code)) will occasionally raise 1007/1009 errors. And given how that loop works it basically just short circuits the retry logic since it’s not protected by that try/catch.

My question is whether this a synthetic error or if there’s really some way that on_error would return a future that’s capable of throwing either of those errors. And if those version errors are real, how would a client be expected to react to such a situation other than some weird recursive retry-retry loop?

[1] https://github.com/apache/couchdb-erlfdb/blob/01f79dbb3938ceac4638752953b84a396d78c0ce/src/erlfdb.erl#L653-L665
[2] https://github.com/apple/foundationdb/blob/5876d694cbfda19cc4229fe4ca85b33d87fb81ea/bindings/python/fdb/impl.py#L264-L270

davisp · April 28, 2020, 9:42pm

I was just pointed at the docs by a coworker where the on_error mentions:

If the resulting future is itself an error, destroy the future and FDBTransaction and report the error in an appropriate way.

So I guess that means we should expect the possibility here. But that seems a bit awkward given the nature of 1007/1009 being retryable.

ajbeamon · April 28, 2020, 9:44pm

I believe the only reason you should see 1007 or 1009 thrown from on_error is if you have a retry limit set, in which case the last error reported will be thrown when the retry limit is reached.

Otherwise, I don’t believe it is supposed to throw these errors. If the above isn’t your problem, it sounds like something we should investigate.

davisp · April 29, 2020, 8:54pm

Good call! That was the issue. I’ll have to add some more error handling for those but I can adjust the tests easily enough to really ratchet up the retry counts when running in buggify mode.

Topic		Replies	Views
Python api handle exceptions with @fdb.transactional Using FoundationDB	2	480	October 10, 2021
Checking if an error is retryable? Using FoundationDB bindings	1	1868	November 29, 2018
What does it mean "Transaction Is Retryable", In particular to Handle Error of "Transaction Too Old"? Using FoundationDB bindings	5	3551	June 13, 2019
Java bindings: "Transaction too old" errors during point reads should retry but do not Development	0	223	September 23, 2023
Prevent retrying on slow transactions Using FoundationDB	1	714	November 14, 2019

Proper transaction retry loops

Related topics