Hi, from time to time when some of the storage servers are lagging, read attempts fail with an error 1037 (process_behind).
For our use case it is totally ok to read old versions of data, so my question is, is there a way to indicate the FDB client (Java API), that we want to read the specified range regardless of its version?
The storage works by checking read version with waitForVersion() for a request, and will return process_behind error there. To bypass this check, a new option needs to be added to the request so that the behavior of waitForVersion() can be different. I don’t think this super hard, but does require many client-side and binding changes.
In the Java bindings, calling snapshot() on a Transaction will return a ReadTransaction instance which has a setVersion method. Would forcing a version number using this method achieve the correct behavior, i.e. send that specific version as the expected one and therefore have waitForVersion consider it?
You can set an arbitrary read version on a transaction, and FDB will try to honor it. But the challenge here is that you’d have to guess the right read version. You could try fetching a read version, subtracting 2,000,000, and thus letting a process be 2 seconds behind as you’re saying you’d like a 2 second stale read result. But it doesn’t really satisfy your exact request of “give me the most recent data you immediately have”.
I think I was talking with @andrew.noyes about this a year or two ago, and he pointed out that the storage server side of implementing this request is a bit more complicated, as it has to make sure that the “most recent version” that it offers to a client is something that’s promised to kept after a recovery, and not literally the most recent version as that would expose to-be-rolled-back data to clients.
I could keep track of the last seen read version in a successful getRange, assuming we have sufficient reads this should converge towards the actual current “valid” read version that can be used. So the flow would be to attempt a getRange without specifying a read version and if an error 1037 is encountered, retry the getrange with the last seen read version so we can serve stale data instead of failing.