When working on my application, I find that versionstamps are frequently a footgun, specifically attempting to read a versionstamp that was written in the current transaction.
Why does that limitation exist? I can create an incomplete versionstamp in my code and print it, and it has a byte representation, so why can’t the server return an incomplete versionstamp?
Versionstamps are just a placeholder on the client/app side. The value of the versionstamp of a transaction is only determined at the commit time, more precisely when the commit proxy replaces the actual value of the versionstamp for mutations after obtaining the commit version from the sequencer. Because of this limitation, it’s impossible for the client/app to read versionstamp before the commit time.
I get that the complete versionstamp isn’t filled out until the transaction commits, but in Java it’s completely legal to just create an incomplete versionstamp and read it, serialize it to bytes, etc. Given that, why is it necessary for the API to throw an exception when an incomplete versionstamp is read, rather than returning an incomplete versionstamp like I would get before attempting to commit?
To try to answer the original question of why this is a thing though, let’s first look at the motivation for the read-your-writes cache. If we view transactions as functions that take a database and return a new database (a database being a set of key-value pairs), then we can model the current state of FDB as the composition of some sequence of transactions. FDB’s conflict checking preserves this even if there are concurrent writers.
Now let’s say your application has two transactions t1 and t2, and you want to make a new transaction t3 which is the composition of t1 and t2. The read-your-writes cache makes this straightforward - you simply call t1 and then t2 in the same transaction. t2 sees the effects of t1, and the result commits atomically. This is also why e.g. in the python api the @fdb.transactional decorator creates a function that takes a Database or a Transaction. You can commit a function as its own transaction, or you can freely compose many functions into one transaction.
If the read-your-writes cache allowed you to read unresolved version stamps by default, then this whole paradigm no longer works. t2 would see something, but it wouldn’t be the exact effect of t1.
There are probably other ways for this paradigm to fall short, and in each case FDB should by default choose the safe option of not committing anything. One other way for this to fall short is transaction size limits - it could be that performing t1 and then t2 in one transaction makes the transaction too large to commit.
Something that has been quite surprising to me (and does not seem necessary) is that if you have a range with a series of complete versionstamps and one incomplete one, you can’t do a range read on any part of it, even parts that could never possibly contain whatever that versionstamp will be once it’s completed.
For example:
tx1 writes a verionstamped key vs1 and commits
tx2 writes another one vs2 and commits
tx3 writes incomplete versionstamped key vs3, and before committing tries to read the range from vs1 to vs2. FDB throws “read or wrote unreadable key”.
In this example we know that vs3 must come after the current read version, which must come after vs2, so we aren’t actually reading the incomplete versionstamp. Yet it still is not allowed.
I think that’s either a bug, an old version of fdb, or a quirk resulting from the fact that setting an atomic op must complete synchronously in the client API, which can be worked around by getting a read version before performing the versionstamped key op.
What version are you using? Do you still see it if you wait until the transaction has a read version before setting the versionstamped key?
Sidenote: tracking unreadable keys significantly complicates the implementation of the read-your-writes cache. While it’s probably theoretically possible to fix the above quirk, I kind of doubt anybody will invest enough to pull it off, or if it’s worth the maintenance burden even if someone does.
Looks like you’re right - it only affects get requests. I find this surprising. My naive reading of the code is that it wouldn’t be too too difficult to apply this to get range requests as well, but I’m not 100% sure.
Anyway I can confirm that you’re right. Current status is “gets only”
Do you still see it if you wait until the transaction has a read version before setting the versionstamped key?
Waiting until after the transaction has a read version helped a lot, thank you!
Limiting a range to only show versionstamps with tx versions less than the current read version works now (incomplete versionstamps are hidden). However, if I include versionstamps whose version is equal to the current read version, I’m back to “read or wrote an unreadable key”.
I don’t think that’s right - I believe a tx’s commit version will always be greater than its read version, so the incomplete versionstamps cannot be in that range. Or have I misunderstood something?
The reason this is a bit of an issue is that if we:
Commit tx1 with a versionstamped key
Immediately start a new transaction tx2, in which we get the read version
Then the read version in tx2 is (usually) equal to the commit version of tx1. So we can’t write a new versionstamp in tx2 and then get a range that excludes that while still including the key from tx1.
The read version is equal to “the last committed version as of the time it received the request”.
On commit, “the commit proxy will ask for a new version from the master”
“FoundationDB guarantees that no version will be generated twice and that the versions are monotonically increasing.”
Since the read version is an existing version and the commit version is a new version, it follows that the commit version must always be strictly greater than the read version.
Since the read version is the last committed version, it means this range overlaps with versionstamps added in the previous transaction, preventing you from reading them.
I think if the code that calls getVersionstampKeyRange (or that function itself) were updated to add 1 to the read version, it would solve this.
I think you’re right. I think a PR for that would be probably be welcome. I would approve it myself but I recently made a career change and I don’t have merge capabilities anymore.