FoundationDB

FoundationDB error code 1009 (Request for future version)


(seddonm1) #1

Hi,
I am trying to build a deterministic simulation using the Golang drivers and am seeing a non-deterministic error from FoundationDB of ‘FoundationDB error code 1009 (Request for future version)’.

Currently the logic uses a Watch and then each time the FutureNil resolves I pull the latest records (GetRange) push them to a Golang channel and reset the Watch. The pattern was described here: Transaction.Watch Examples?

I assume this is a race condition but can you please elaborate on the circumstance that will produce this error (there is nothing returned by a google search). Perhaps the comments about Snapshot reads will prevent this error?

On another topic this code works with 5.1.7 but hangs on 5.2.5 suggesting some sort of change/regressions.

Thanks


Golang hanging with Foundation 5.2.5
(Steve Atherton) #2

Request for Future Version is indeed the result of a race - between the client’s request to a storage server for data at some version and the storage server’s receipt of mutations up to and including that version from the log system. Using a snapshot read will not prevent the error.

You should not see this error in a healthy cluster not under saturation. The flow of committed mutations from proxies -> logs -> storage servers is pipelined such that under normal conditions while knowledge of a version’s durability is in transit from a log to the proxies the new durable mutations are in transit from that log to the storage servers reading from it. A proxy can only hand out a specific version as a read version after learning it is durable on all logs (plus some other communication). So to get a future version error a client must get a read version and use it on a storage server before the storage server can get the mutations for that version.

This scenario should mean that something is bottlenecked about the flow of mutations from logs to storage servers. Also, before returning a future version error a storage server will wait a small amount of time to see if the new version arrives. So, perhaps your cluster is under a saturating workload?


(seddonm1) #3

Thanks for such a great answer Steve.

This ‘cluster’ is currently a local installation on a late-2013 Macbook Pro with 16GB RAM. What I suspect is that I am experiencing the proxies -> logs -> storage delay due to memory swapping (the Docker daemon plus a few instances of Electron based text editors can consume a crazy amount of RAM).

I have had a 3-node dedicated Intel NUC cluster doing 10 writes/sec with watches on all nodes for a week or so and no problem like this. I will do some testing with more free RAM to see if I can confirm that.

It would be awesome if your team could add some descriptions like you have provided to the official page: https://apple.github.io/foundationdb/api-error-codes.html to help people understand what might be the cause of some of these errors without having to disturb you.

Workload:
  Read rate              - 602 Hz
  Write rate             - 333 Hz
  Transactions started   - 209 Hz
  Transactions committed - 103 Hz
  Conflict rate          - 0 Hz

(A.J. Beamon) #4

When you say it hangs on 5.2.5, are both the client and server running 5.2? A hang is the expected behavior if you are running mixed versions (e.g. 5.1 and 5.2), as different major/minor versions are not compatible.


(seddonm1) #5

Hi,
Update:

  • giving foundationdb enough ram seems to have solved the 1009 error.

Re 5.1.7 vs 5.2.5:

I carefully built the environments with docker to ensure versions were aligned:

Everything would execute correctly for a while then after some time the golang code became unresponsive however the foundationdb cluster was still fine which I could prove by restarting the golang program. I downgraded everything to 5.1.7 and the same program does not hang. We can continue the discussion in a different thread as maybe others have experienced the same problem and could help reproduce?


(A.J. Beamon) #6

Yeah, it sounds like what you’re doing should work with respect to the versions. Feel free to raise this issue in another thread, and we can try to figure out what’s going on.