Transactions without mutations

Hi folks! I’m toying with a scheme to serve data derived from a read-only transaction from cache without re-deriving it or sacrificing linearizability. The scheme is roughly the following:

  1. Derive the data from a read-only transaction, and record the read snapshot and read conflict ranges of that transaction.
  2. Before serving from cache, validate the derived data by committing a transaction with the same read snapshot and read conflict ranges (and a dummy write conflict range to get the fdb client to actually send an rpc). If this commit is successful then you know that no transactions which could be known-committed before the request have changed the result of any of the reads involved in deriving the data.

A few questions:

  1. Does this seem sound?
  2. Waiting for the transaction batch to become durable on tlogs is this biggest source of latency for validating with this scheme, but since these validation transactions have no mutations it seems unnecessary for the commit proxy to wait for logging before responding. However, it’s not sound in general for commit proxies to reply to commit requests with no mutations before their commit version is durable, since a client could call getCommittedVersion and conclude incorrectly that mutations from any transactions with the same (or earlier) commit version are durable. Should we not expose a committed version for transactions with no mutations [1]?
  3. Has anyone tried something like this before?

Btw another source of transactions without mutations is the dummy transactions used in the native api to validate that transactions are not in-flight, so this could improve transaction latency in some failure scenarios.

Edit: wacky idea: what if we present the largest known committed version as the commit version for transactions without mutations? Not sure if we can present a sensible versionstamp though since we wouldn’t have a batch index.

  1. We already do this for transactions with no mutations and no write conflict ranges: foundationdb/ at 8878de8c8f06ab7557b5edf40ffe6ae50a35ac19 · apple/foundationdb · GitHub ↩︎

Even if changes aren’t durable, it’d still seem safe to return before TLog commit, as even if a recovery happens and some in-flight transactions get rolled back, none of them could have conflicted with your read-only transaction otherwise the resolver would have objected in the first place. It’d be super weird, as you’d now have essentially a read version in a window of versions that was skipped by a recovery, but I can’t immediately think of anything that’d break. I think you’d basically just have to ask simulation :confused:. I’m not even convinced that a read-only transaction committing successfully and everything else around it getting rolled back post-recovery would be a linearizability serializability violation.

Layer for read-write transactions lasting longer than 5 seconds - #10 by alexmiller was a similar old discussion of ours for long-running read-write transactions, and I still similarly think that periodically re-validating the read set and bumping the read version up would work there too. You’d doubly similarly need to make sure that you re-validate your cache before 5s, otherwise you’ll have to throw the entries away or re-read and compute the cached value.

re: wacky idea, you have to go through the resolver anyway, so I think you’re going to have to have already picked a sensible commit version to use anyway? You’d even have a batch index, it’s just that you could reply early. Trying to tack an extra commit read-only transaction onto an already-committed batch feels, I mean, maybe it could work? But I’m not even sure the resolver supports resolution up to some version other than the most recent? This seems like a definitely harder direction to try and go.

If you actually used that read version to read, you might read uncommitted data.