We’ve forked the foundationDB repo and are working on building a custom FDB image.
We know that there are simulation/correctness tests that are run every time there is a merge, but since we are on working/building from a forked repo, is there an available open-sourced testing framework/infrastructure that we can add into our pipeline? We see that we can run each individual test manually like so:
./bin/fdbserver -r simulation -b on -s 1-f tests/fast/AtomicOps.txt
It was mentioned that the open-source project performs extensive performance testing. Is there already a framework in place to run all these tests(simulation/correctness/performance) that we can use? We would need a duplicate of the extensive testing (at the same level), to ensure that our changes wouldn’t degrade the scale that we require.
The software we are using is mostly open source, but building this infrastructure will be non-trivial.
To run correctness (including valgrind and bindingtester tests) we use Joshua which you can find here. We (Snowflake) use Kubernetes to automatically scale this cluster up and down. Do nightlies we run:
100k correctness tests
100k bindingtester runs
10k valgrind runs
The Docker images we use are also publicly available. We also use some application specific tests, those are not publicly available.
For performance we use mostly Mako which is part of the FDB source (there’s a C implementation and a workload – using the workload might be a bit easier). Mako is very useful for micro-benchmarking. Additionally we also have a complex synthetic internal workload we use for performance testing, but this again is not available to the community (it also wouldn’t be very useful).
There’s other options you can use:
There’s a TPC-C implementation which ships with FDB
There’s a YCSB implementation that Apple is using. I don’t know where this implementation is, but I believe it is also open source (though YCSB is IMHO not a good benchmark for FDB – but this statement could also depend on the workload you’re using).
Obviously the easiest solution for you would be to contribute back to open source and we will run the tests for you.
Hey, I’ve been having trouble trying to set up Joshua to run the correctness test. This is how how our file structure is set up:
ubuntu@machine1:~/custom-fdb-6.3$ ls -1
build/ (which is our build directory)
fdb-joshua/ (repo for fdb-joshua)
foundationdb/ (repo for our forked foundaitonfb, with custom code)
create the tar, after I have already built my branch, from my forked repo, with custom code:
ubuntu@machine1:~/custom-fdb-6.3$ tar -czvf joshua_tar.tar.gz ../build/fdb.cluster
but getting these issues with one of fdb dependencies, which I can’t seem to resolve:
ubuntu@eso-jenkins-2b-slave-db145-i-0612e3011ac135780:~/fedramp-fdb-6.3-audit-logging/fdb-joshua$ python3 -m joshua.joshua start --tarball joshua_tar.tar.gz
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/fedramp-fdb-6.3-audit-logging/fdb-joshua/joshua/joshua.py", line 32, in <module>
from . import joshua_model
File "/home/ubuntu/fedramp-fdb-6.3-audit-logging/fdb-joshua/joshua/joshua_model.py", line 43, in <module>
import fdb.tuple
ModuleNotFoundError: No module named 'fdb.tuple'
This is a complex question… The general answer is that this is currently at the discretion of the reviewer. I would say the rule of thumb is like this:
If the reviewer believes that a change won’t have a performance impact, the change can just be merged. If we catch a regression down the road we might revert the change and ask the contributor to address the problem.
If we believe there’s a positive performance impact we might do the same – although having proof (or at least some minor benchmark) is obviously preferred
For complex changes it will depend. If we’re interested in the change, we might just do the performance analysis ourselves (if you plan to contribute a large feature, it would be good to communicate with people before starting the implementation anyways). Otherwise we will probably at least give guidance.
Generally though, we have some internal infrastructure that can run performance tests pretty much automatically and for 99.99% of all PRs this is sufficient. So I don’t think an external contributor has to worry about performance results. If you’re planning, for example, to rewrite the commit pipeline, we probably will need to collaborate on that or you would need to proof that your approach is better (however you define better…).
This looks like you don’t have the fdb python bindings installed. I never did set up joshua myself, I think other people (for example @kao.makino ) could help you more than I…
wget https://www.foundationdb.org/downloads/6.2.15/bindings/python/foundationdb-6.2.15.tar.gz
tar xzvf foundationdb-6.2.15.tar.gz
cd foundationdb-6.2.15/
python setup.py install
cd ..
Joshua
git clone https://github.com/FoundationDB/fdb-joshua.git
cd fdb-joshua
python setup.py install
But I’d recommend you to use the Joshua agent docker image even for the client, so you don’t need to do any of the above.
Hey @kao.makino so I’m going with the approach you recommended with docker, but getting some issues…
I’ve built their docker image via:
./build.sh
docker images are there from the ./build.sh file:
(python3) ubuntu@random-computer:~/fdb-6.3/build$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
foundationdb/agent-scaler 20220125191625-e1ef13f681 961240cf5491 35 minutes ago 336MB
foundationdb/agent-scaler latest 961240cf5491 35 minutes ago 336MB
foundationdb/joshua-agent 20220125191625-e1ef13f681 cf59afc5b719 36 minutes ago 3.37GB
foundationdb/joshua-agent latest cf59afc5b719 36 minutes ago 3.37GB
centos 7 eeb6ee3f44bd 4 months ago 204MB
centos latest 5d0da3dc9764 4 months ago 231MB
To start agents, I run:
sudo docker run --rm -v /home/ubuntu/fdb-6.3/build/fdb.cluster:/opt/joshua -it foundationdb/joshua-agent:latest
but I get this error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/home/ubuntu/fdb-6.3/build/fdb.cluster" to rootfs at "/opt/joshua" caused: mount through procfd: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.
I’ve tried unmounting the volume from the image, but still getting the issue. Any pointers are appreciated. Thanks!