Cpp implementation of an ExternalWorkload

Hi, I am working with @PierreZ on the integration of a Rust Workload in the fdbserver simulation. To do so I wrote a C++ shared object (libcppwrapper.so) to be loaded in the ExternalWorkload of the fdbserver. The C++ workload exposes bindings that I can then use in Rust. In the end I simply call fdbserver -r simulation -f ./test.txt with test.txt as follow:

testTitle=MyTest
  testName=External
  workloadName=TestWorkload
  libraryPath=path/to/lib/
  libraryName=cppwrapper

This scheme can already link the methods of ExternalWorkload (init, setup, start, check and getMetrics) to a Rust implementation. Two things are currently not working: I can’t call the logger trace method (either through a FDBLogger instance or a FDBWorkloadContext instance) and calling most C bindings from libfdb_c.so results in segmentation faults. I simplified the project by removing the Rust part. C++ calls directly the logger and the C bindings, but I get the exact same errors.

Here is the github repository (GitHub - PierreZ/test-fdb-workload) with all the code and a CI that reproduces the situation I’m describing.

Do you know what might be the cause of this?

1 Like

One problem I see so far is that you need to call fdb_select_api_version. See C API — FoundationDB 7.1.

There is an example external workload here that seems to work: foundationdb/SimpleWorkload.cpp at 1328c343231e447c1822ba1edea8c315c194955e · apple/foundationdb · GitHub. It does get compiled and linked with some very specific flags (which you can see if you build with ninja -v or make VERBOSE=1), and it’s possible that some of them are necessary to work properly.

Thanks for the quick answer. Indeed, adding fdb_select_api_version has fixed some problems and C bindings like fdb_get_client_version now work correctly. Although it didn’t fix everything, the trace method of the logger still produces invalid behavior and more complex C bindings like fdb_future_block_until_ready crash fdbserver. From what we can see, it seems that most of the pointers passed between ExternalWorkload and our workload implementation are somewhat sensitive. As you suggested, we also think this is due to differences in compilation on both sides. But we can’t make sense of the compilation commands in (for example) SimpleWorkload.cpp:


/opt/rh/devtoolset-8/root/usr/bin/c++ -DBOOST_ERROR_CODE_HEADER_ONLY -DBOOST_SYSTEM_NO_DEPRECATED -DNO_INTELLISENSE -Dc_workloads_EXPORTS -I/root/foundationdb -I. -Ibindings/c -I/root/foundationdb/bindings/c -Ibindings/c/foundationdb -O3 -DNDEBUG -fPIC -DCMAKE_BUILD -ggdb -fno-omit-frame-pointer -mavx -Wno-pragmas -Wno-attributes -Wno-error=format -Wunused-variable -Wno-deprecated -fvisibility=hidden -Wreturn-type -fPIC -Wclass-memaccess -DHAVE_OPENSSL -std=gnu++17 -MD -MT bindings/c/CMakeFiles/c_workloads.dir/test/workloads/SimpleWorkload.cpp.o -MF bindings/c/CMakeFiles/c_workloads.dir/test/workloads/SimpleWorkload.cpp.o.d -o bindings/c/CMakeFiles/c_workloads.dir/test/workloads/SimpleWorkload.cpp.o -c /root/foundationdb/bindings/c/test/workloads/SimpleWorkload.cpp

&& /opt/rh/devtoolset-8/root/usr/bin/c++ -fPIC -O3 -DNDEBUG -static-libstdc++ -static-libgcc -Wl,--version-script=/root/foundationdb/bindings/c/external_workload.map,-z,nodelete -shared -Wl,-soname,libc_workloads.so -o share/foundationdb/libc_workloads.so bindings/c/CMakeFiles/c_workloads.dir/test/workloads/workloads.cpp.o bindings/c/CMakeFiles/c_workloads.dir/test/workloads/SimpleWorkload.cpp.o -Wl,-rpath,/root/build_output/lib lib/libfdb_c.so && :

Using github actions we created two branches:

You can see the results and log files on their respective CI pipeline.

We have separated the two cases for isolation purposes, but we believe the two problems are related.

We tried to compiled it and make it work locally but couldn’t. First it seems this example is incomplete as FDBWorkloadFactoryImpl::create searches for the name of the workload in a static _factories map but SimpleWorkload don’t seem to be registered there. And manually instantiating SimpleWorkload yields later the same errors (bad trace logging and segfaults on C bindings) as our workload implementation. Which further supports the theory of a compilation difference.

Do you see any other possible cause for these problems? Do you think they are related? And if it is a compilation difference, can you see why?