[Java] Validity of `tx.get()` futures not waiting to complete before `tx.commit()` being called

(gaurav) #1


Is the pattern followed by the code snippet below a valid one? (it could be that there are better ways to achieve same result, but right now, I am trying to validate this pattern).

Here, it is ensured that the calls on underlying C client library are called inline in function body. Just that the function does not wait for those read futures to be complete.

What I am not sure is that the underlying C library’s will either (a) block commit call till all these read futures are ready or (b) if it will go ahead with commit call and set the read futures later (probably does not make sense) or (c) it will throw exception if there are pending read operations when commit is called. -> {
    final CompletableFuture<byte[]> f1 = tx.get(new Tuple().add("a").pack());
    final CompletableFuture<byte[]> f2 = tx.get(new Tuple().add("b").pack());
    final CompletableFuture<byte[]> f3 = tx.get(new Tuple().add("c").pack());
    // ... may be even try to read more values based on above result / or perform some set().
    return CompletableFuture.allOf(f1, f2, f3);


(Clement Pang) #2

It is valid to commit the transaction before the read returns, the reads will all have the same read version (the first read will trigger getting that) and hence you will get the proper snapshot isolation (side topic, for reads to be linearizable, having the read version is critical). Of course with this you can’t do anything with the results that were read (in the same transaction) if you have already committed the transaction. If the read took a long time (to reach the proper servers for instance), it could throw a past version since the read version is already beyond 5 seconds ago.

(gaurav) #3

Thanks Clement! I have a few more questions (as I had almost convinced myself that above pattern has to be illegal).

  • What if the transaction is closed and committed before the reads have completed by the client library? Would’n’t the C library free up resources of transaction on close(), thereby making it problematic for earlier read() calls to complete?
  • What if there were conflicts on server with the reads issued in the transaction? Since the transaction commit() is responsible for conflict checking, once that call is completed, there will be no further conflict checks on reads issued via committed transaction. (Note: reads are not made on snapshot() of db and hence stronger consistency conflict check rules should be applied).

I am thinking of a following history of events:
– tx1.create()
– tx2.create()
– future(
– future( (this call is not even sent to server yet by client thread)
– a = (future completed)
– tx1.commit()
– tx2.write(b)
– tx2.commit()
– b = (call gets launched and completed)

Thanks for hinting the snapshot direction - it is interesting to ask the original question assuming that reads() were issued on snapshot of database.

PS: Is there a help section that can explain the precise semantics of what constitutes a conflict in FDB? I am basing above questions from this text from Dev Guide where it says that if any value read during a transaction is modified by any other transaction committed before this one does, it will be considered as a conflict.
When the transaction is ready to be committed, the FoundationDB cluster checks that it does not conflict with any previously committed transaction (i.e. that no value read by a transaction has been modified by another transaction since the read occurred) and, if it does conflict, rejects it.


(Alex Miller) #4

Note that read-only transactions are serialized as of the read version, and not when commit is called. Unless you add a write conflict range to your transaction, calling commit() will be a no-op. The .snapshot() method gives you a way to do reads such that they don’t add read conflict ranges.

That pretty much is it. Each transaction has a read conflict set and a write conflict set. Transactions can commit if there haven’t been any writes to the database between their read version and commit version that intersect with their read set. If they commit, then their write conflict set is remembered for future batches of transactions that might conflict with the now-committed transaction.

There’s nothing special about the way FDB handles this than other databases, so reading other sources works equally well. I found these notes from Rensselaer Polytechnic Institute to be moderately easy to follow. For a formal treatment of the topic, I’d suggest Database Systems: The Complete Book by Garcia-Molina, Ullman, and Widom, Chapter 18. If you prefer watching a lecture, then I’d recommend Andy Pavlo’s CMU Database Systems - 19 Timestamp Ordering Concurrency Control.

(gaurav) #5

Thank you Alex. Appreciate the clarifications and further pointers!

(A.J. Beamon) #6

I believe it’s the case that the commit call will block until the reads all complete.