Transaction Too Old error when using a common read version with multiple client connections

Our application splits up read requests for multiple range scans to use multiple separate processes that each issue their own separate read requests to improve our parallelization for reads. When we try to use the same read version for each of the individual read requests, we immediately encounter transaction_too_old error, error code 1007. The Coordinator process calls get_read_version() and then sends the read version number to the Read Worker processes, which will create a transaction and call set_read_version() before calling get_ranges() and send the results from the range scan back to the coordinator. The get_ranges() requests all fail immediately with error code 1007. The elapsed time between the initial call to get_read_version() and the get_ranges() request is single digit milliseconds. Every range scan fails every time. Is a read version tied to an individual client connection or any other reason why multiple client connections cannot use the same read version?

Currently using the Rust foundationdb-0.4.2 crate for the client, on linux and testing this locally with the memory storage engine.

The read version you receive from one client should be usable from any other, so long as you use it on the same cluster in a timely manner. There are possibly some concerns around the ability for ratekeeper to throttle your traffic (it typically does this using the get_read_version request), but if you aren’t saturating the cluster it should work ok.

One thing you could try as an experiment is to get a read version in a transaction on your worker processes to compare to the one you are receiving from the coordinator process. If they are very different, that would be telling. It would also be interesting if you also saw the same errors on transactions that fetched their own read versions.

Thanks for the response! That was my understanding as well. Tom played around with things and noticed that everything worked fine as long as we called get_read_version() at least once in the process. Our child workers only use read versions provided to them. It seems that the client is completely unusable until you call get_read_version() (directly or indirectly) at least once. Arguably, we’re holding it wrong, but it would be nice to at least document this rough edge. I’m happy to file a GitHub issue.

For reference, the code that is biting us is here: https://github.com/apple/foundationdb/blob/master/fdbclient/DatabaseContext.h#L249

I wonder if this is something to do with the Rust crate, which is a third-party client library not supported directly by the FoundationDB project.

The C client should be calling get read version if you don’t do it yourself, so that’s somewhat fishy. I suppose it’s also possible that, say, the integer is being incorrectly passed from Rust to C somehow. But that’s just speculation.

To be clear, we are calling set_read_version() on every single transaction in this process, all of which are read only. As the code was originally written, get_read_version() was never called even indirectly, and the FDB client chokes if you do that. In particular, when we call get_read_version() as part of startup, everything works just as expected.

Oh right, I remember now that this was added as part of some other feature in 6.2 it looks like: