I have a question regarding the read version. I appreciate it if you help me understand.
For read-write transactions, why we need to get the Read Version and why the database must return values with version no greater than the read version? I know this is for isolation, i.e., to make sure the transaction is reading the databases at a point in time defined by the read version. However, if the database does not return the most recent version while there is such a version, then the transaction is guaranteed to be aborted at the conflict checking time.
For example: suppose I have version 6 and 4 for a key. A transaction is assigned read version 5. When the client wants to read the key, the database returns version 4, as 6 is greater than 5. But by doing so, we know for sure the transaction will be aborted at the commit time because this transaction will be assigned a commit timestamp larger than any previously committed transaction, i.e. larger than 6 e.g. 7. So now we have a version with a timestamp less than 7 that is not read by the transaction. Thus, conflict checking will abort it.
Shouldn’t the client just read the most recent version that we have for each key? We shouldn’t worry about reading an inconsistent view; If there is any conflict, that’s fine, as the conflict checking will abort the transaction. On the other hand, if deliberately don’t return the most recent version, we guarantee that the transaction will abort.
This is not true - a conflict occurs only if some key (or a key range) actually read in your transaction has undergone a change between getting the read-version and the commit. Db guarantees a readversion > any version already committed versions.
In this case, Db will return 6 as the read-version. Read-version is a global property, across all the keys. If version 6 was committed before the transaction was started, then it is guaranteed that the transaction will be assigned a read version > 6.
What if at the time of getting read version, 6 is not committed yet. So database return 5 for example. But while the transaction is working on its logic someone else commits a transaction with timestamp 6.
T1 starts and receives read version 5.
T2 commits with version 6 and writes a version for item x.
T1 reads x, but since its read version is 5, we don’t return what is written by T2.
T1 will abort at conflict checking.
If we would have returned the most recent version that we had (no matter what is read version), T1 didn’t have to abort.
In any case, whatever has happened in the system, do you agree when the client wants to read a key, and we see we have a version with a timestamp higher than the read-version, the transaction will abort? If I have versions 4 and 6 for x, and the client is requesting to read x with read version 5 (as I explained above how it can happen), for sure this transaction will be aborted if we return 4. As at the conflict checking we see the version read by the transaction has been overwritten.
Aborting transactions as soon as we see that the read version causes reading stale versions is one solution to save time and resources. But what I am saying is we don’t need read version for read-write transactions at all. For read-write transactions that need to be committed, we can freely read the most recent version that we have for each key. If there is an issue that causes a violation of serializability, that’s fine, as conflict checking will abort it. By reading the most recent version we increase the chance of the transaction being committed.
I think we need to get a read version only for read-only snapshot reads.
I think that’s only true insomuch as there is a delay between getting a read version and your first read. Once you’ve done any read, you are bound to that version at least in the sense that it will be the basis for determining whether you have a conflict. Any reads that happen at newer versions in subsequent reads would be grounds for conflicting the transaction.
Some other concerns –
If you know that your transaction is going to conflict, why read the newer version at all? Doing so also exposes you to seeing inconsistent versions of the data, which could be bad if you have any other side-effects besides committing.
Reading the latest data is prone to another problem – namely that you may see data that hasn’t actually been committed. Data can move to the storage server prior to the commit being fully realized.
If you’re always reading the most recent version, then your transaction body will see non-serializable results during execution, as no promises are made about the consistency of reads between two (or more) different keys. Any execution that sees non-serializable results will abort at conflict checking time, but this still breaks developer expectations. (I’ve worked on a system that behaves in this fashion, and it’s difficult to widely communicate and explain that you can’t rely upon nor ASSERT() database invariants in transaction bodies.) But strictly in terms of serializable correctness, I agree with you.
There’s a couple other purposes that GetReadVersion servers that one would need to replace in order to implement such a feature. AJ mentioned one, of an internal optimization within FoundationDB that allows data which isn’t committed yet to propagate to storage servers, and the GRV logic is what prevents clients from seeing that uncommitted data. GRVs also serve as rate limiting, which would need to be done somehow else.
I see what you mean, and I mentioned at the beginning that I know we do that for isolation. However, not returning the most recent version to keep snapshot consistency to avoid abortion, does not help; the transaction will abort. In other words, database doesn’t return the latest version it to keep the snapshot consistent. But by not returning the most recent version it guarantees that your transaction will abort.
But regarding uncommitted versions showing up in the version chain I see your point. If we return most recent version we will violate serializability. I didn’t consider uncommitted versions. But again note that since I think no version would ever show up in the version chain before at least resolvers consider it as committed, not returning the most recent version will guarantee abortion.
I think as soon as we see there is a version higher than read version, we shouldn’t bother continuing our transaction, as it will be aborted. Whether this newer version is committed or not.
I’m getting a bit confused on which side of this we’re discussing here…
If this is the “I have a read version, and a value with version >= my current version is available on the storage server” case, then that’s exactly what Abort conflicting transactions before commit is about. If we’re discussing the “always read the most recent version, and rely on the resolver to reject non-serializable reads in read-write transactions”, then we don’t have an issue open for that. You, or someone else, is welcome to file one.
There’s differing amounts of client changes needed to support each of these approaches. Currently there do exist “read only” transaction objects and “read-write” transaction objects in some client bindings, but the only difference is that the read-only transaction objects hide the write methods. To the FoundationDB client, we mostly can’t tell the difference between a read-only transaction and a read-write one — a read-write transaction is basically a read-only one that you call commit() on.
The proposal to allow aborting transactions based on read results before commit is a way to save work within our current model. Read-write transactions would still read a consistent snapshot, but we’d be more intelligent about not committing transactions that we know will fail conflict checking. Due to this mostly being compatible with the current model, I think we could get away with just offering “abort on reads” as a transaction option, and most people could upgrade their existing code easily.
What you’ve proposed would require a much stronger separation in the client between read-only transactions and read-write transactions. Read-only transactions read from a consistent snapshot and will never abort. Read-write transactions may see non-serializable read results, but have a higher chance at committing than current read-write transactions. Due to the semantics change, I don’t think there’s as clear of an easy upgrade path, and it’d be more disruptive to clients to drive for adoption on this. Some users would happily accept the extra work for the reduced conflict rate.
I’m not claiming either approach is better or worse. I think pushing for users to more clearly annotate read-only transactions from read-write ones is a great long-term goal anyway.
Ok, I think I understand the idea here, and I believe this would also require us sending versions for all of the read conflict ranges in a transaction, is that right? The optimization you’d have available can be thought of as a chance to upgrade any older reads to your newest read version if they weren’t modified between their read version and the newest one, but you wouldn’t want to assume all reads happened at the oldest version.
Precisely, yes. And that “chance to upgrade” is the increased chance of a transaction committing. What actual percent drop you’d see in conflicting transactions in realistic workloads though, I have no idea.
You could probably do something a little more bandwidth optimal than sending one version per read range, I suppose. If x was written at versions 10 and 20, then anything in between works. You could probably thread a hint through with the read so that if version V is the same as the most recent version, then return V, otherwise return the most recent version. Then you’d at least be able to have one version apply to a vector of ranges, but, still extra overhead that I hadn’t mentally factored in before.
Just to clarify I am mainly trying to understand FoundationDB not to suggest changing it at this time. I couldn’t understand the reason for using read versions for read-write transactions, so I asked you for help.
In summary, this is the idea: I think if we guarantee that only committed versions are written to the storage, for read-write transactions we don’t need to worry about anything else and we can just return the most recent version that we have. This way we don’t lose anything while increasing the chance of successful commits.
To use this idea, however, we need to:
Have a clear separation of read-write and read-only transactions.
Have only committed versions on the storage. (I feel we might be able to remove this requirement if we update our resolvers when we role back storage due to a failure, but I am not sure)
Since we don’t satisfy the requirements above, I think we cannot use the “return-the-most-recent-version” idea now.
(* Actually we used this idea in GRIT, a protocol we designed for distributed transactions over microservice at eBay. In GRIT, since we only append committed transactions to the transaction log we don’t need to worry about not committed versions showing up in the storage. https://tech.ebayinc.com/engineering/grit-a-protocol-for-distributed-transactions-across-microservices/)
Disregarding this idea, the early abort idea is definitely a good idea. If the transaction is going to abort why bother continuing it.