Hi, we have started working on this project again and found a lot of new things and would really appreciate confirmation of some points as many were “reversed engineered”.
First of all, for the compilation problem we haven’t identified why exactly this happens but found a temporary fix by simply compiling in the official foundationdb container. Apart from replacing “g++” by “c++”, our Makefile remains unchanged, so the problem is not due to compiler arguments.
With a logger and database finally working, we started working on the Rust bindings again. After many hours of debugging, we arrived at this mental representation of what’s going on:
-
fdb calls the setup
method of the workload
-
fdb can’t do anything until setup
returns, i.e. any future created in setup
won’t be resolved until we exit, thus waiting for a future in setup
results in a deadlock
-
fdb won’t call start
until the GenericPromise done
is resolved, so if no boolean is ever sent to done
it results in a deadlock
-
once done
is resolved, fdb calls starts
and the database can move in memory, so any callback set by setup
, running after this point and holding on the previous database pointer will most likely crash when trying to use it
-
done
is a GenericPromise and holds a smart pointer to a FDBPromise, if its last reference is dropped fdb knows and throws a “broken_promise” error, so we have to be carefull when passing it between C++ and Rust (through a C interface)
It also seems that start
and check
behave exactly the same way. More generally it seems that whenever execution is granted to the workload (either in a “step method” like setup
or a callback we defined), fdb is paused (we assume this is done to preserve determinism). So we have to chain callbacks every time we want to block on actions that return a future.
As it is very verbose to write (and we would like to use foundationdb-rs which abstracts the raw bindings and uses Rust Futures) we tried to use async/await
in a blocking runtime in a separate thread:
fn setup(&mut self, db: Database, done: GenericPromise<bool>) {
std::thread::spawn(move || {
// on separate thread, create and poll futures
runtime.block_on(async {
// it crashes here, as we use db
let trx = db.create_trx().unwrap();
trx.set_read_version(42);
// sets a callback and wait for it to be called
// similar to fdb_future_block_until_ready
let version1 = trx.get_read_version().await.unwrap();
// "chained future"
let version2 = trx.get_read_version().await.unwrap();
done.send(true);
});
});
// returns execution to fdb so futures will be resolved
}
but this crashes as soon as we try to use db and it seems to be due to running on a thread that is not managed by fdb.
The “pseudo code” of a working version of the above example would be something like:
fn setup(&mut self, db: Database, done: GenericPromise<bool>) {
let trx = db.create_trx()
let f = fdb_transaction_get_read_version(trx);
fdb_future_set_callback(f, callback1, CallbackData { trx, done });
}
fn callback1(f: *mut FDBFuture, data: CallbackData) {
let mut version1;
fdb_future_get_int64(f, &mut version1);
let f = fdb_transaction_get_read_version(data.trx);
fdb_future_set_callback(f, callback2, data);
}
fn callback2(f: *mut FDBFuture, data: CallbackData) {
let mut version2;
fdb_future_get_int64(f, &mut version2);
data.done.send(true);
}
We think we can emulate the second code by writting our own runtime, that will simplify code to:
fn setup(&mut self, db: Database, done: GenericPromise<bool>) {
our_runtime_callback(async {
let trx = db.create_trx().unwrap();
trx.set_read_version(42);
let version1 = trx.get_read_version().await.unwrap();
let version2 = trx.get_read_version().await.unwrap();
done.send(true);
});
}
But we would like being sure we really understand how we are supposed to use the simulator before sinking more hours in it and potentially finding out that it simply can’t work like that because we overlook something. So, did we understand correctly so far how it works? Are we wrong on some points? And can you think of some important points we didn’t mentionned?
On a side note, I had some troubles passing along the done
GenericPromise. The solution I got working is the following (in the C++ wrapper):
struct RustWorkload;
template<typename T>
struct Wrapper {
T inner;
};
extern "C" void rust_setup(
RustWorkload*,
FDBDatabase*,
Wrapper<GenericPromise<bool>>*
);
class WorkloadTranslater: public FDBWorkload {
private:
RustWorkload* rustWorkload;
public:
virtual void setup(
FDBDatabase* db,
GenericPromise<bool> done
) override {
// this increments the ref counter as done is copied
auto wrapped = new Wrapper<GenericPromise<bool>> { done };
rust_setup(this->rustWorkload, db, wrapped);
} // the ref counter is decremented as done goes out of scope
};
and Rust can call FDBPromise_send_bool
which is defined in C++:
extern "C" void FDBPromise_send_bool(
Wrapper<GenericPromise<bool>>* promise,
bool val
) {
promise->inner.send(val);
delete promise;
}
Does this seem reasonable? Or can you think of a better/simpler solution?