How are you testing your layers?

This post is meant to be the developer-equivalent of What do you monitor? post.

How are you testing your layers? Are you testing special behaviors like Transactions with unknown results? Is there a way to manipulate the simulation from a binding?

Related posts

I don’t think it was advertised much, but I’d suggest looking into the client buggify support if you haven’t seen it. I think the only adopters were CouchDB (as IIRC it was implemented specifically with supporting them in mind), and their codebase still shows the artifacts of its integration.

It looks like that integration was done by @nickva, who might be a nice enough person to comment on how useful it was (or wasn’t) in terms of actually finding, reproducing, and fixing bugs, or if there’s any caveats of note. :slight_smile:

I at least see a couple bug fix commits which specifically reference issues it found:

2 Likes

I missed it somehow, thanks :smile: I’m not seeing any way to provide a defined seed to drive the client_buggify, is this the expected behavior? I was expecting something like the bindingTester with seeds that can be injected/replayed.

Thanks for the links!

I think we didn’t bother since we kind of expect client applications to make heavy use of threads and even if the client is single-threaded the ordering of events (i.e. calling callbacks when futures become ready) still depends on what order the server responds to requests. That said I don’t think we’d be opposed to say a network option which sets the state of the random number generator.

My personal pet client testing crusade has to do with “in-flight” commits. Usually when a commit future becomes ready, the fdb client has taken great care to ensure that the commit is not in flight. For transaction_timed_out and cluster_version_changed though, the commit might still be in flight! A reasonable user might do the following:

  1. Attempt to commit a transaction
  2. That commit future fails with transaction_timed_out
  3. Read a unique key that was to be set in that transaction
  4. Observe that the key is absent and conclude incorrectly that the transaction did not commit and will never commit
  5. The commit (which has been “in-flight” this whole time) succeeds after the read in step 4

For transaction_timed_out this isn’t so bad, as the default retry loop (i.e. on_error) does not consider transaction_timed_out to be retryable. It does however consider cluster_version_changed to be retryable.

Client buggify does attempt to simulate this situation if you have a timeout set on your transaction. I suppose we could do the same with cluster_version_changed if you’re using the multiversion client (or maybe just unconditionally?)

Developer Guide — FoundationDB 7.1 has some more detail about this (and should probably be updated to mention cluster_version_changed). It also mentions that cancelling a commit future also does not stop that commit from being in-flight.

Here’s a few more popular ways of writing bugs that don’t show up in local testing:

  1. Attempt to maintain an in-memory data structure that is derived from database state. This turns out to be tricky to get right. Importantly, if you set $key to some $value in a transaction, and commit that transaction with $commit_version, you cannot assume that the $key is set to $value at $commit_version. This is because commits are done in batches, and the entire batch of commits gets the same commit version. A transaction appearing later in the batch may have performed a blind write to $key. It’s much easier if you maintain your in-memory data structure based only on reads. If you’re making decisions about what keys to write based on the state of this in-memory data structure this gets even more complicated. I would consider just actually performing the necessary reads to re-derive the data structure in every transaction, but this might cause a read hotspot. Otherwise you could carefully track the read version and keys your data structure is derived from, and use that read version as the read snapshot of your transaction and add those keys to your read conflict range. If you need to avoid a read hotspot and need linearizability (i.e. a fresh read version) then you can look into a scheme using \xff/metadataVersionKey (again also tricky).

  2. Performing operations on a transaction outside of a retry loop. Most bindings come with some kind of doTransaction function that accepts a lambda and implements the default retry loop, so if you use that you probably don’t need to worry about this.

  3. Reading at snapshot isolation. If you don’t test with concurrency then you may find that you’re missing a read conflict you actually wanted.

  4. Interacting with mutable data structures that have a lifetime longer than the body of your retry loop. E.g. if you append to the same list every time you retry.

If you’re using the default retry loop, using snapshot isolation sparingly, testing with concurrency, checking invariants in your data model, and making sure your transactions actually do a lap around the retry loop sometimes you’re probably in pretty good shape.

Btw if you’re using client buggify I would recommend introducing a few bugs intentionally and tweaking the “section_activated” and “section_fired” probabilities until you are actually catching those bugs, but still completing transactions.

Thanks @alexmiller! Running CouchDB tests with client buggify [1], we discovered quite a few bugs. A good number of those have been randomly popping up in production but were tricky to reproduce during testing. The best part is how easy it was to enable it as it’s just a few environment variable toggles. I highly recommend giving it a try, it’s an awesome FoundationDB feature.

Also, agree with @andrew.noyes on the lack of seed support. At least for Erlang, with lots of concurrently running Erlang processes, a deterministic seed wouldn’t have helped us as much there.

[1] ERL_ZFLAGS="-erlfdb network_options '[client_buggify_enable, {client_buggify_section_activated_probability, 25}, {client_buggify_section_fired_probability, 25}]'" make elixir tests=...

As Andrew pointed out, setting a seed will not make client buggify deterministic. This is true even if the client is single threaded and behaving completely deterministic. Mostly this is because all system calls (network, disk etc) are not deterministic. In the simulator we work around this problem by simulating system calls, but in normal operation this doesn’t work.

If you want determinism the right solution would be to run your whole layer in the simulator. Now this is not widely used (and therefore probably has bugs and you might need to recompile fdb in order to use it). But I do believe it would be the most thorough testing you can get out of it.

I wasn’t aware of of these behaviors :upside_down_face: Thanks for sharing. Are they any weird behavior I should be aware?

You confirmed an intuition I had about caching.

Sounds like a testing plan to me :+1:

It get the “pop the issue during testing and not in prod”, but I’m not sure on how to industrialize this, as you cannot reproduce it locally. From my point-of-view, you can only blindly “brute-force” the issue and hopefully find the problem fast enough.

Determinism is pretty awesome, I was mind-blown by the productivity I have when running the bindingTester. The whole find a faulty seed, fix it, repeat dev experience has been great to stabilize the rust bindings, but I don’t feel we need it yet as a first testing layer strategy(I might be wrong on this :thinking:).

A first testing strategy I had in mind was be sure that we are handling errors correctly in our work-in-progress layer sdk in a bindingTester fashion-way:

  • Generate a seed,
  • Run a workload which will fire some errors on the bindings,
  • validate the state of the keys in fdb.

Checking concurrency is another step, and at that point, I agree with you @markus.pilman, we need determinism for that :smile: In Rust, Async runtimes are in theory swappable, so we may be able to provide determinism. In fact, I was so curious towards FDB’s simulation framework and how it could behave in Rust that I tested things for fun.

I wasn’t aware of this, thanks a lot! I was searching for a way to do something similar, we are going to try this :smile: Why do you think we might need to recompile fdb?

Out of curiosity, did you already discuss about the possibility to expose some simulation stuff through the C API? I haven’t wrote any Flow yet and have very light knowledge about the Simulation, so I could be completely off the mark, but I’m wondering if a layer could be asking things directly to the simulation framework, like:

  • next random value,
  • should I buggify this line,
  • what (simulated) time is it,
  • and a lot of complicated things I cannot imagine right now :stuck_out_tongue_closed_eyes:

What do you think?

My memory was off, I don’t think you would need to compile your own FDB, it should work (although I never tested this with an installed version of FDB). Generally though: be aware we’re not really using this feature so I would expect some bugs.

In simulation there is no C API. I think you’re thinking of binding tester which is different (the fdb simulator will run a chaos-test while the binding tester is more intended to test that multiple bindings implement the same logic).

So what you’re suggesting would be only useful in a fully simulated workload and these are usually implemented in Flow and utilize the native API (so you can do all the things you describe and much more).

You’re right that the current feature of embedding a binding-workload in the fdb simulator is of limited use because a lot of thing you might want to do are not possible anymore or they will at least change determinism (you can get away with a lot of stuff as long as you don’t use threads or any async APIs – but for example you can’t even use time).

It would be a really cool project to make this mechanism more available though. Basically, we would need to provide a framework and some kind of standard library replacement that an application could call to and get simulated behavior. Then you could chaos test your application. You could also do other interesting stuff that are hard to implement outside of the simulator. Stuff like start a new process on a different machine (maybe even in a different data center), send network messages between clients, kill processes, introduce network partitions etc. You could build this to the extreme where you could even use the simulator to test applications that don’t even use FDB.

While this would be cool (and probably useful), there’s currently just nobody even using the simulator outside of FDB core testing. A project like this would be a huge investment and I don’t think we would get enough value out of this to prioritize this. So for now it is just a cool idea.

Thanks @markus.pilman for your detailed answer :smile:

I will give it a try anyway :smiley:

I agree, cool idea, but with huge investment.