How much time and resourced are dedicated to FoundationDB testing?

Recently I’ve known the testing technique used in FoundationDB, which is very amazing to me. What I’ve been pursuing is a kind of testing framework that can effectively detect/reproduce bugs in distributed system. Obviously FDB has done lots of work about this.

I have some questions to ask:

  1. In your team’s testing environment, how many machines you’re willing to offer to support such a big testing, especially concurrent testing triggered by multiple developers (considering isolation of testing among them) ?
  2. Once a developer pushes a commit, will all of FDB test cases (I mean all) be ran ? I think it will take lots of time when the commit is considered OK by all test cases. Is this acceptable for everyone ?
  3. How long does finishing all test cases take ?

To sum up, I’m concerned about time and resources taken by such a rigorous and complex testing during everyone’s daily development.

Any insights ? Love to hear.

There is a blog post that describes how Snowflake is running these simulation tests.

To give a data point, each PR receives 10k simulation runs in the CI. During development, we typically run 100k simulation tests for each PR before considering it’s ready for review. The resource requirement is also significant, at one time 100k tests need 500 cores for about two hours, which can be reduced by throwing more cores at it.

1 Like

Wow, that’s lots of time per PR. It may be the costs of rigorous testing, I’m afraid. But it has to.