Howdy,
I’ve finally gotten around to playing with the new client buggify feature and its managed to turn up an issue I’m not immediately 100% sure on how to address.
In the Erlang bindings we have a loop [1] that looks like such:
do_transaction(?IS_TX = Tx, UserFun) ->
try
Ret = UserFun(Tx),
case is_read_only(Tx) andalso not has_watches(Tx) of
true -> ok;
false -> wait(commit(Tx))
end,
Ret
catch error:{erlfdb_error, Code} ->
put(?ERLFDB_ERROR, Code),
wait(on_error(Tx, Code)),
do_transaction(Tx, UserFun)
end.
This was based off the similar retry loop in the Python bindings [2]. The interesting issue that buggify has turned up is the wait(on_error(Tx, Code))
will occasionally raise 1007/1009 errors. And given how that loop works it basically just short circuits the retry logic since it’s not protected by that try/catch.
My question is whether this a synthetic error or if there’s really some way that on_error would return a future that’s capable of throwing either of those errors. And if those version errors are real, how would a client be expected to react to such a situation other than some weird recursive retry-retry loop?
[1] https://github.com/apache/couchdb-erlfdb/blob/01f79dbb3938ceac4638752953b84a396d78c0ce/src/erlfdb.erl#L653-L665
[2] https://github.com/apple/foundationdb/blob/5876d694cbfda19cc4229fe4ca85b33d87fb81ea/bindings/python/fdb/impl.py#L264-L270