Deferred value checks as an alternative to the metadataVersion key for caching in layers

KrzysFR · May 23, 2020, 12:49pm

After using this for a while, I found a few cases that need special care when using this technique.

The main issue is that the retry loop handler may fail because the application code throws errors/exceptions due to invalid assumptions. Meaning that it (or code in a layer it uses) assumed that the value of a key was something, but then following reads did not return the expected result, and the layer or application code interpreted that as some invalid situation (corrupted data? invalid state in the worflow?).

Real life example: layer code is using a cached directory subspace’s prefix to read keys. The directory subspace was just deleted and then re-created with a new prefix. Old prefix is still in cache, and points to either empty data, or WORSE, new data because the prefix was reused somehow! This is very frequent inside unit test suites that repeatedly delete and re-create entire sub-trees between each test.

When this happens, and if there is AT LEAST one failed value-check, the retry loop HAS to retry the handler.

If the exception was due to an invalid assumption, the layer or app code should see that the check failed, reload everything from the cluster, get the correct data, and no throw again.
If the exception was unrelated, if should (probably) fail again.

This makes this even more difficult to implement well, because if any of the layer code mixes things up in the value-check retry logic, it may end up retrying forever…

But anyway, with that in place, here is an example of this in action:

The layer code is very simplistic and assume that it has a "Foo" key that is either "NotReady" or "Ready". The value of Foo becomes "Ready" after some (slow) initialization step that will set the value of a key "Bar" to some important value, which should NOT be null !

So the initial state of the database may be:

Foo = NotReady
Bar does not exist.

After execution the slow initialization process, the database SHOULD be:

Foo = Ready
Bar = Something important for the application

The transaction code expects to be in the “Ready” state, and will optimistically assume that it is true without doing any reads, and rely on retries to discover the ugly truth (that someone forgot to run the init script!)

First transaction (that discovers something is wrong):

Transaction #10 (read/write, 17 operations, '#' = 0.5 ms, started 12:26:15.0039736Z [1590236775.003], ended 12:26:15.0139512Z [1590236775.013])
┌  oper. ┬─────────────────────┬──── start ──── end ── duration ──┬─ sent  recv ┐
│ 0   // │                     │ T+  0.013                        │     -     - │ // APP: I'm feeling lucky! Let's assume the db is already initialized
│:0   G °│ ##$                 │ T+  0.022 ~   1.066 (  1,045 µs) │    19     8 │ Get ("value_checks", "Foo") => 'NotReady'
│:0   // │ `                   │ T+  0.035                        │     -     - │ // APP: The value of 'Bar' better not be empty...
│:0   G °│ ##$                 │ T+  0.036 ~   1.046 (  1,009 µs) │    19     0 │ Get ("value_checks", "Bar") => not_found
│ 1   // │ __`                 │ T+  1.060                        │     -     - │ // APP: UH OH... something's wrong! let's throw an exception!!
│ 2   // │ ___`                │ T+  1.186                        │     -     - │ // Handler failed with error: [InvalidOperationException] Oh noes! There is some corruption in the database!
│:2   // │ ___`                │ T+  1.208                        │     -     - │ // Failed value-check 'foo' for ("value_checks", "Foo"): expected 'Ready', actual 'NotReady'
│:2   Er°│ ___X###########+    │ T+  1.232 ~   5.709 (  4,478 µs) │             │ OnError NotCommitted (1020)
├────────┼─────────────────────┼──────────────────────────────────┼─────────────┤ == Attempt #2 == 12:26:15.0097157Z (1590236775.009)
│ 3   // │ °°°°°°°°°°°°°°°`    │ T+  5.742                        │     -     - │ // SetOption(Timeout, 15000)
│:3   // │ °°°°°°°°°°°°°°°`    │ T+  5.854                        │     -     - │ // Previous attempt failed because of the following failed value-check(s): foo
│:3   // │ °°°°°°°°°°°°°°°`    │ T+  5.872                        │     -     - │ // APP: doing the actual work to check the state of the db, and initialize the schema if required...
│:3   G °│ °°°°°°°°°°°°°°°.##$ │ T+  5.878 ~   6.982 (  1,105 µs) │    19     8 │ Get ("value_checks", "Foo") => 'NotReady'
│ 4   // │ °°°°°°°°°°°°°°°___` │ T+  6.991                        │     -     - │ // APP: initializing the database!
│:4   s  │ °°°°°°°°°°°°°°°___` │ T+  7.010 ~   7.017 (      7 µs) │    24       │ Set ("value_checks", "Foo") = 'Ready'
│ 5   s  │ °°°°°°°°°°°°°°°___` │ T+  7.019 ~   7.020 (      1 µs) │    28       │ Set ("value_checks", "Bar") = 'Something'
│ 6   // │ °°°°°°°°°°°°°°°___` │ T+  7.020                        │     -     - │ // APP: The value of 'Bar' better not be empty...
│:6   G  │ °°°°°°°°°°°°°°°___` │ T+  7.022 ~   7.038 (     16 µs) │    19     9 │ Get ("value_checks", "Bar") => 'Something'
└────────┴─────────────────────┴──────────────────────────────────┴─────────────┘
> Read 25 bytes in 9.976 ms and 2 attempt(s)

The code then first optimistically assumes that the layer is “Ready”, and expects a non-null value in “Bar”. If it is null then it assumes that there is some corruption in the database, and throws an exception.

The retry loop handler sees the exceptions, but sees the failed value-check, and retries.

On the second attempt, the layer code does not assume anything, reads “Foo”, sees that it’s “NotReady” and then does the actual work to initialize the db.

All the `following transactions show the case when the value-check is successfull.

Transaction #11 (read/write, 4 operations, '#' = 0.5 ms, started 12:26:15.0141316Z [1590236775.014], ended 12:26:15.0159212Z [1590236775.015])
┌  oper. ┬─────┬──── start ──── end ── duration ──┬─ sent  recv ┐
│ 0   // │     │ T+  0.008                        │     -     - │ // APP: I'm feeling lucky! Let's assume the db is already initialized
│:0   G °│ ### │ T+  0.014 ~   1.696 (  1,682 µs) │    19     5 │ Get ("value_checks", "Foo") => 'Ready'
│:0   // │ `   │ T+  0.023                        │     -     - │ // APP: The value of 'Bar' better not be empty...
│:0   G °│ ### │ T+  0.024 ~   1.683 (  1,659 µs) │    19     9 │ Get ("value_checks", "Bar") => 'Something'
└────────┴─────┴──────────────────────────────────┴─────────────┘
> Read 14 bytes in 1.787 ms and 1 attempt(s)

Transaction #12 (read/write, 4 operations, '#' = 0.5 ms, started 12:26:15.0160095Z [1590236775.016], ended 12:26:15.0169962Z [1590236775.016])
┌  oper. ┬───┬──── start ──── end ── duration ──┬─ sent  recv ┐
│ 0   // │   │ T+  0.007                        │     -     - │ // APP: I'm feeling lucky! Let's assume the db is already initialized
│:0   G  │ # │ T+  0.011 ~   0.945 (    934 µs) │    19     5 │ Get ("value_checks", "Foo") => 'Ready'
│:0   // │ ` │ T+  0.016                        │     -     - │ // APP: The value of 'Bar' better not be empty...
│:0   G  │ # │ T+  0.017 ~   0.935 (    918 µs) │    19     9 │ Get ("value_checks", "Bar") => 'Something'
└────────┴───┴──────────────────────────────────┴─────────────┘
> Read 14 bytes in 0.986 ms and 1 attempt(s)

What we can observe is that both the reads for “Foo” and “Bar” start and complete at roughly the same time, even though, logically, the application should first read “Foo” and then if the value is “Ready”, read the value of “Bar”:

Start “Foo” read
Await “Foo” read
Start “Bar” read
Await “Bar” read

In the above examples, though, both reads are merged, or rather the read for “Foo” is not awaited, but rather deferred right until the end.

So the actual order of operation is:

Start “Foo” read
Start “Bar” read
Await “Bar” read
Await “Foo” read

It looks like the fdb client multiplexes both reads so they are essentially merged into a single network roundtrip! If you look carefully, you see that the end time for the “Foo” reads is a few µsec later than the end time for “Bar”.

Topic		Replies	Views
Cannot commit transaction that reads the metadataVersion key after changing it Using FoundationDB	6	1296	December 13, 2019
How to safely add a metadata caching "layer" on top of existing layers? Using FoundationDB	1	1306	December 12, 2019
A few design-pattern + check-my-understanding questions Using FoundationDB	9	2259	February 21, 2019
A new tool for managing layer metadata FoundationDB Core	10	2497	April 3, 2019
Implementing atomic DDL for SQL schema FoundationDB Layers	8	2143	September 8, 2018

Deferred value checks as an alternative to the metadataVersion key for caching in layers

Related topics