Run nightly tests before pull-request submission

Hi everyone,

I’m from Wavefront team.
We’d like to run nightly tests for FoudnationDB before we submit pull requests.
Could you tell us how would we do that?
Are there any existing tools for that, or we must create it.
For now we can execute tests one by one. Do you have a ready solution to run group of tests or we need to organize it by ourselves?

Regards,
Marine

Do you mean the simulated correctness runs?

To do that, you should be able to just run “fdbserver” in a loop. In particular, you’d want to run it with:

./bin/fdbserver -r simulation -f tests

As the arguments. There’s some work to do to collect the results and aggregate them, but you can also look at the exit code to know if a test has failed. Each time, it will pick a random test to run, though you could also imagine a program that did something more sophisticated like pick a random test for you and do things like run upgrade tests by starting an older version and then switching to a newer version to verify that upgrades can happen. Or, for example, run the same test twice with the same seed and verify that we get the same unseed both times to make sure the tests are actually deterministic.


Edit: On further investigation, it actually doesn’t pick a test, so you’ll probably want to wrap it in one that chooses a test file from one of the subdirectories of the “tests” directory. (Starting with “fast” and “slow” and sometimes “rare” is probably fine. Adding restarting later is probably reasonable, but a little more involved given that you need to be able to pick an old version, run it, and then pick a new version.) Your wrapper script should also probably add a buggify flag to make so that, with some probability, you add -b on to increase the number of error cases that fdbserver handles.

I have a lot of questions regarding to the testing process:

  1. Do all these tests (fast, slow, rare) contain different scenarios? If yes, then when developer does some changes, does he need to run all these tests before commit?
  2. You have mentioned to run tests from fast, slow, rare directories. What about the others? Why don’t we need to run them?
  3. Are there any documentation for the tests?
  4. According to this every night simulation and performance tests are run. If it is possible could you describe how it is implemented? Does it pick a random test for simulation or it runs all?
  5. How can we compare simulation results of our local executable with the results of the release-5.2? Do we need to compare only passed/failed or we need to compare output log or something more.

Each test file (more or less) contains a different scenario, and because each test is randomized, each test run (with a given file) is also, in some sense, a different scenario. (Each test also gives you a seed you can supply at startup so you can re-run the exact scenario that surfaces an error.)

Ideally, the developer would run each of the tests many times (with different seeds) before submitting a PR. We also run these tests are ourselves, so we’d almost certainly catch something introduced in a PR prior to it being released in the wild, but it would be harder to track down at that point. If you have some idea as to which part of the code you are affecting, you could probably get away with running only a subset of the tests at least initially. (For example, if you made a change to the backup code, you might want to run the tests that have “Backup” in the name.)

Hm, I may be mistaken. I think you also want to run the status tests, but I could be wrong. The other directories are: (1) python_tests, which are only about testing the Python bindings, so don’t really make sense to run with simulation; (2) restarting, which requires that you first run the server at an older version, stop it, and then start it at a newer version (which are good to run, especially if you change anything about the on-disk format, but requires you have multiple fdbserver versions lying around); and (3) right in the tests directory, which are tests that are generally more about performance than correctness, so they don’t really make all that much sense to run.

I guess it depends on what you mean. Most of our documentation is oriented towards someone trying to use FoundationDB rather than develop it, so there aren’t a lot of docs. The page you found about simulation and testing is probably the closest thing we had to docs, I suppose. But there isn’t, to my knowledge, a comprehensive guide on which tests do what.

For simulation, it’s essentially what I outlined: a test harness picks a random test and a random seed and runs the simulation test and then sees if it succeeds or fails. This happens in a loop, so over the course of a night, you can get many, many runs. For the performance tests, we run the in-built performance tester that has come up elsewhere on the forums. It’s just a matter of setting up a (real–not simulated) FDB cluster and then running the load tester that’s in the binaries and then collecting statistics.

Generally, pass/fail should be enough. We have some API stability tests already to make sure that the external contract is preserved (which is generally the behavior that we care the most about keeping consistent across versions), though that could definitely be improved, so if a change might affect the external API, there might be slightly more work to be done to verify that the contract hasn’t changed. (Sorry if this answer is somewhat vague.)