The short answer is: to guarantee external consistency.
For the long answer, lets look at the alternatives:
Get Read-Version from Master
You could get a read-version from master. You could ask the master what the last version was it gave out and use that as a read version.
As you correctly pointed out, this has the drawback that now the client would need to wait for all in-flight transaction-batches (a write version is used for a batch of transactions) to commit. This can take a long time and I guess this is the main reason for it not being done that way. However, this would be a correct way of getting read versions.
Another drawback here would be that there’s a bigger chance of seeing future_version
errors which would increase load on the system. Last but not least, the master role can’t be scaled. So putting more work towards the master is generally something you want to avoid (even though giving out read-versions isn’t super expensive, it still would require the master to get a transaction budget from Ratekeeper, batch requests and generally have to maintain potentially thousands of network connections from clients and server processes).
Get Read-Version from one Proxy
You can actually do this by setting the flat CASUAL_READ_RISKY
(I am not a fan of the name of this, as I would argue that in almost no cases doing this is actually risky). In that case the client would ask a random proxy for the last committed version.
This is probably the cheapest way of doing it (and I would argue in most deployments this should be the default).
However, this strategy comes with drawbacks:
- You could have a stale proxy. In that case the read-version would be old. This would increase chances of transaction conflicts (and conflicts are generally bad as these transactions use resources without doing work).
- It will violate external consistency. This means that if you start two transactions A and then B (in that order), A might get a higher read-version than B - so B will logically run before A. This basically means that your transactions will be reordered. Your transactions will still be serializable (a database is allowed to reorder transactions as long as the ordered execution is serializable) and this is why I would argue it is in general not a big deal.
- You might not read your own writes. This is a consequence of losing external consistency. You could write something in a transactions, successfully commit and then start another transaction which will read the same keys but you will read them in the state they were before you committed. I guess this is really the worst drawback as it is not very intuitive to the user. But keep in mind: you still wouldn’t violate serializability: any attempt of writing after reading this old state will result in a conflict.
I would like (and probably will implement) some changes to make above problems less problematic. We can so something for each point:
- The proxies could communicate with all other proxies at least once per 10ms (I made up this number on the spot - might be larger or smaller). This way, a proxy could limit how old the read version is.
- I think not having external consistency is fine for most applications. If you need external consistency, you need to disable
CASUAL_READ_RISKY
.
- Whenever the client commits a transaction, it could set an internally maintained max-committed version. After getting a read-version it could compare this read version with the highest committed version it knows of and use the larger one. This would be free for the FDB cluster (and very cheap on the client) and it would guarantee that would always see your own writes. From the view of one client you would have almost external consistency.
With these two simple changes, (and 1 might have been done already, but I need to check the code), clients could only see that there’s no external consistency if they communicate with each other through other channels than FDB (for example through direct network communication or another database). If that is the case in your application you might not want to use CASUAL_READ_RISKY
.
Don’t get Read-Versions
This is actually a valid strategy. Whenever you commit, you get a version back that you can use as a read-version for your next transaction. Obviously, if your client doesn’t generate any load it would mean you will get a stale read version and all your transactions will abort (either by conflicting or by getting a past_version
errors.
However, you could measure throughput at your client and decide to only fetch a read-version from a proxy if your last commit is old. Clients could even share their versions with each other, although I am not convinced you can do this in a more efficient way than just using the proxies.
I hope these explanation helps. There might be other ways of optimizing this part and I would be happy to hear about them.