Hi there. I was recently reading the FDB paper published in sigmod‘21 and there was a design that confused me.
It mentioned in section 2.4.1:
A transaction is considered committed after all designated LogServers have replied to the Proxy, which reports the committed version to the Sequencer (to ensure that later transactions’ read versions are after this commit) and then replies to the client.
Why it should report the committed version to the Sequencer? (The step 3.3 in figure 1)
I don’t think it’s necessary for the sequencer to maintain the last committed version. And the descriptions in other parts of the paper also don’t seem to emphasize this.
Suppose we define the last committed version as
lcv, and the commit version which send to Proxy (maybe uncommitted) is
The Proxy then asks the Sequencer for a read version that is guaranteed to be no less than any previously issued transaction commit version, and this read version is sent back to the client.
previously issued transaction commit version here is
First, the Proxy contacts the Sequencer to obtain a commit version that is larger than any existing read versions or commit versions.
commit versions here is
I’m not sure if my understanding is correct. If maintenance the last committed version is not required, then there’s no confusion. And if the last committed version is necessary in Sequencer, can you describe the significance of maintaining it? Thanks a lot!
When a client asks for a read version, it would not be incorrect to reply with
ucv. Doing so, however, would be significantly slower to the client, as when it turns around and begins issuing read requests with the provided
ucv as its read version, it will have to wait for all in-flight transactions to be committed and propagated to storage servers before the reads can be serviced.
lcv is tracked, so that when a client is given a read version, it represents what has just been committed, and no further in-flight transactions will need to finish committing before reads at the provided
lcv can be completed.
So it’s not strictly necessary for correctness, but it is required for performance.
I think this actually would be incorrect. If you get a read version which isn’t durable then you can read non-durable data during a recovery.
StorageServers will roll back any data lager than RV, so I think it can not read any data which is uncommitted at last. I think for performance is a good reason.
If I don’t understand it properly, can you elaborate on the problem scenario, please?
Thanks a lot for your replies!
StorageServers will eventually rollback uncommitted data when they learn there was a recovery, but there’s a window where they might serve it (provided the read requests use an uncommitted read version)
You can avoid this problem by sending the committed version to StorageServers while all LogServers have persisted their logs and returned a successful commit to the client, and StorageServers maintain this information for subsequent read requests.
So I think it’s OK for the Sequencer to only guarantees that the read version is greater than all allocated commit versions (even if not committed) if only correctness is a consideration. You can ensure that uncommitted data is not read by maintaining MVCC at the StorageServers. And if the client requests data with an uncommitted read version, let the client wait for it to become committed (although it may be rolled back due to LogServer failure, which may require more design here), or just return uncommitted and let it retry later. This does not affect the correctness of the system, but the performance may be worse.
And I think the MVCC of the data should have been maintained at StorageServers, as mentioned in the paper (5s MVCC window maintained in StorageServers), but not specifically how.
So I think the only problem here is that the client will have to wait for the data on StorageServers to become committed, as @alexmiller described, which may cause performance problems.
Makes sense - storage servers could wait until they know that a version is committed before serving a read. That’s not how foundationdb is currently implemented though. I see now that we’re talking about hypothetical design choices foundationdb could have made so you can ignore what I said. I just wanted it to be clear that if the only change you made to foundationdb’s implementation was to reply to a GRV with an uncommitted version that would be incorrect.