Simulation of disk I/O

I was looking at this, "Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson - YouTube, talk on You Tube concerning deterministic simulation for testing. This seems very interesting. I can see how mocking out networking would work since networking is just communication with some chaos added in. But disk I/O actually needs to write bytes at some point and probably needs to read them back. Just writing into memory limits the scale of a test. So what do you do about disk I/O? Can anyone point me a the relevant parts of the foundation DB code that implements this?

Thanks,
Sean

The file system layer in simulation is quite complex. If you want to look at the code you can find it in these files:

  • AsyncFileNonDurable.actor.h (the classes AsyncFileDetachable and AsyncFileNonDurable
  • AsyncFileNonDurable.actor.cpp
  • sim2.actor.cpp implements SimpleFile

SimpleFile is the lowest lever of abstraction here. The basic idea is:

  1. All disk IO is done using blocking IO.
  2. Since we simulate time, these syscalls look like they complete in 0 seconds.
  3. We add delay statements with a random time before/after each syscall to simulate them taking time (this is also a form of failure injection – we can simulate a disk being very slow).

Additionally there’s logic to sometimes randomly throw exceptions. We also can do other things (like randomly corrupt data). But keep in mind that this is difficult as FDB doesn’t protect against byzantine failures. So certain failures that we could see on a disk in theory we might not simulate because we know it wouldn’t work. Though the only such failure I can think of is that a number of write followed by a fsync would return successfully but the data doesn’t get written at all.

1 Like