Flow & determinism & logical time

ssmith · May 5, 2024, 8:10pm

Can I get some idea as to how deterministic I/O is arranged in a single-thread.

Flow “is able to conduct a deterministic simulation of an entire FoundationDB cluster within a single-thread!”

There must be a logical clock, and a way for each I/O component to advance one state at a time depending on the order the single-thread calls actor instances.

markus.pilman · May 7, 2024, 8:22am

I am not sure I understand your question correctly, or rather which part of the system you want information about. What you’re asking is a complex, multi-layered question.

But basically, we use coroutines to run an event driven loop (our default implementation uses a mix of boost asio, kernel AIO, and eio). So outside of simulation a single process can saturate a CPU core while executing many tasks concurrently.

In simulation, we have a different implementation of the low-level event loop. It:

implements its own clock source
uses blocking IO for disk-operations plus a sleep to simulate latency
uses simple in-memory queues to simulate the network, again with sleep to simulate latency

Each type of IO implements its own contract to simulate hardware (so network connections can fail or congest, disk IO can corrupt files, disk IO is ordered per disk etc).

In simulation, all operations are synchronous. Either because they don’t do actual IO (like for the simulated network) or because they are implemented to be synchronous (like for disk IO). So in order to simulate asynchronous operations, we inject semi-random sleeps when executing these operations. A sleep (or delay which is how we call them in flow) just puts a coroutine into a queue.

The clock is also simulated and works as follows:

It is initialized to 0 at startup
Each coroutine/task in the main queue has a time value associated with it (so when you call co_await delay(1.0); under the covers we put the current coroutine into the task queue and set now() + 1.0 as the time field). This main queue is implemented as a simple min-priority queue with time as priority
The loop will pop the task with the smallest time from the queue, update the time to whatever the time is this task is meant to run at, and execute the task (re-enter the associated coroutine).

So in pseudo-code, a file-read might be implemented something like this (minus the error handling):

if (g_simulator->getCurrentProcess()->failedDisk) {
  co_await Never(); // simulate a failed disk
}
// make sure that all operations on a file are ordered and each operation has some
// latency
diskParameters->nextOperation = std::max(diskParameters->nextOperation, now());
auto randomLatency = randomLatency = 10 * deterministicRandom()->random01() / diskParameters->iops;
// don't consume more iops than this disk provides
diskParameters->nextOperation += (1.0 / diskParameters->iops) + (size / diskParameters->bandwidth);
// wait until we can read
co_await delay(now() - (diskParameters->nextOperation + randomLatency));
// this is a synchronous operation and will not consume any (simulated) time
co_return read(fd, data, length);

One big benefit of this simulated time is that simulated time can pass much more quickly than real time. So we can always saturate a CPU core, even if the system is stuck without making (much) progress.

Topic		Replies	Views
Simulation of disk I/O FoundationDB Core	1	397	October 11, 2021
How does Flow handle long operations? FoundationDB Core	6	1243	April 29, 2018
Systems using simulation testing Using FoundationDB	2	1386	August 27, 2019
Why was Flow developed? FoundationDB Core	5	3623	November 3, 2019
RocksDB backend Development	15	5638	November 15, 2018

Flow & determinism & logical time

Related topics