Hello *,
I’ve been working on improving the FoundationDB packaging for NixOS, and in the course of testing various new things for FoundationDB 6.1, I wanted to run simulation tests on a larger, more convenient scale. This is important for making sure the binaries shipped to users are reliable to some extent, and generally speaking this applies to any 3rd party packaging. Out of curiosity I set out to do this using Kubernetes. I’d like any feedback on this.
The result is a set of tooling available here. TL;DR if you’re impatient you can read the README and get a feel for it:
There are many details in the README and, if you follow the instructions, it should Pretty Much Work™.
It’s quite easy to build Docker images using Nix, which is the basis of this tooling. Essentially, I create a docker image out of the FoundationDB Nix packages, equipped with a shell script to run simulation tests. The tests come directly from the source repository and are packaged up into the distribution, the same ones run with ctest
in the CMake build system – the list of included tests is here: https://github.com/thoughtpolice/nixpkgs/blob/nixpkgs/fdb-61-fixes/pkgs/servers/foundationdb/test-list.txt
The wrapper script gives you a little tool for running any simulation test a number of times right out of the Docker image. For example, you can run the fast/AtomicOps.txt
test like so:
$ docker run --rm foundationdb:6.1.5pre4879_91547e7 simulate fast/AtomicOps 10
NOTE: simulation test fast/AtomicOps (10 rounds)...
NOTE: running simulation #1 (seed = 0x136276c4)... ok
NOTE: running simulation #2 (seed = 0x2b807568)... ok
NOTE: running simulation #3 (seed = 0x8b3e79e3)... ok
NOTE: running simulation #4 (seed = 0x11581fb5)... ok
NOTE: running simulation #5 (seed = 0x88159bf0)... ok
NOTE: running simulation #6 (seed = 0xa27d94fb)... ok
NOTE: running simulation #7 (seed = 0x5081e040)... ok
NOTE: running simulation #8 (seed = 0x9ad8c268)... ok
NOTE: running simulation #9 (seed = 0x741db05f)... ok
NOTE: running simulation #10 (seed = 0x462fd316)... ok
NOTE: finished fast/AtomicOps; 10 total sim rounds, 10/10 successful sim runs
$
The intention here is that you then create a K8S batch job out of every simulation test, possibly with tweaked concurrency/memory limits, and then throw it all into a cluster a number of times to test things out. The remaining tooling is built on top of this basic image, e.g. a concurrent, 100-round simulation of fast/AtomicOps.txt
is packaged into a job, like so:
$ cat result/simulation-fast-atomicops.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: simulation-fast-atomicops
labels:
group: simulation
test: rare-fast-atomicops
spec:
parallelism: 2
completions: 4
template:
metadata:
name: sim-fast-atomicops
labels:
group: simulation
test: fast-atomicops
spec:
containers:
- name: sim-fast-atomicops
image: foundationdb:6.1.5pre4879_91547e7
args: [ "simulate", "fast/AtomicOps", "25" ]
resources:
limits:
memory: 768M
requests:
memory: 128M
restartPolicy: Never
This project isn’t done yet and is currently only used to smoke test my own FoundationDB packages. In the future it would be nice to:
- Let people specify a source directory of their own (e.g. their own local working copy) to build and package into the image. It’ll be compiled, packaged, etc automatically by Nix.
- Adjust the limits, memory requirements, etc for each job to make the K8S scheduler’s life easier. Right now the estimates are all very conservative.
In the long run, this could perhaps help serve as a basis for an open way to run large scale simulation tests for FoundationDB builds, CI systems, etc (something that is, to my knowledge, only done by people like Apple, Snowflake, etc).
Some questions for feedback:
- Does this seem like a remotely sane method of doing tests, or am I doing something horribly wrong?
- E.g. is there a better way to achieve this with K8S?
- Am I missing anything critical when running the simulation mode? (something like
fdbserver -r simulation -f "${PATH_TO_TEST_FILE}" -s "${SEED_NUMBER}"
)
- What kind of simulation numbers should we be aiming at for “reliable” testing? What numbers does Apple use? For example, you could in theory run every test thousands or tens of thousands of times if you desired with this framework. 1,000 runs per-test? 50,000? A mix depending on the test category (fast/slow/rare)?
- Are these sets of tests I’ve chosen (in
test-list.txt
) correct and/or reasonable? - Would it be possible to update the
CMake
build system to include the testing.txt
simulation files? I do this in the NixOS package for FoundationDB, but this is a bespoke change. They’re very small and it might be nice to have the build system do this officially and ship the set of tests that are expected to work reliably.