FDB does not have predicate pushdown, so you will need to either issue a parallel range scan for each block number, or, if you do not know what block numbers actually exist in a large possible range, you could maintain another key space that is essentially a unique index on block names that exist so you can scan that space first with one range scan to discover the unique block names to use for the parallel range scans.
For your given solution, just one question, why not directly put the commit_time into the value? Left it null or INFINITY before commit and set it after commit.
I was assuming that you would want to write block name records from a transaction before the commit time of the transaction is known, as I thought was the case in your original question. This indirection would allow you to do that.
However, if you don’t mind the cost of going back to write each KV pair a second time, then placing the commit timestamp in the value is not what I would recommend.
I suggest the following, which also would help mitigate the issue of scanning all versions of a block with the MVCC window. The schema is:
("blocks", <block_name>, "transactions", <transaction_id> => <value>
("blocks", <block_name>, "timestamps", <commit_timestamp> => <value>
("commits", <transaction_id> => <commit_timestamp>
("transactions", <transaction_id>, <block_name>) => null
So initially, you have block updates and you have a transaction_id
but no commit_timestamp
.
For each block update, you set
("blocks", <block_name>, "transactions", <transaction_id>) = <value>
("transactions", <transaction_id>, <block_name>) = null
Each FDB transaction you commit can cover 1 or block updates. Once all of the block updates are committed successfully, then set
("commits", <transaction_id>) => <commit_timestamp>
And the write logic ends here. Then, one or more instances of a background cleanup job can then scan the “commits” keyspace to find committed transactions, and for each one of those scan the “transactions” keyspace to find the blocks updated in that specific transaction. For each of those block updates, do this:
- read
("blocks", <block_name>, "transactions", <transaction_id>)
- clear
("blocks", <block_name>, "transactions", <transaction_id>)
- set
("blocks", <block_name>, "timestamps", <commit_timestamp> => <value>
using the value read above and the commit timestamp obtained for the transaction earlier
- clear
("transactions", <transaction_id>, <block_name>)
These changes must be done transactionally for a specific block update, but you can use many FDB transactions, each covering 1 or more updates.
Once all of the block updates have been “converted” from the “transactions” space to the “timestamps” space under the <block_name>'s key space, then clear the ("commits", <transaction_id>)
key.
With all that done, to look up the value of a block at a given timestamp, do the following:
- Using a reverse order
getRange()
with a limit of 1, directly read exactly the block version you want from ("blocks", <block_name>, "timestamps")
- Using a range read, read ALL of the versions of the block from
("blocks", <block_name>, "transactions",
and look up the commit timestamps as I explained earlier.
- Return the appropriate newest version at or under your target
The idea here is while you’re still doing 2 writes per block update, the second write (and a few other mutations) is improving the efficiency of reads by moving block update records into a timestamp-ordered keyspace containing only committed updates.
By using a background cleanup job whose work queue is stored in the database, you don’t have to worry about a single process dying in the middle of retracing its block updates to put the commit timestamp somewhere in the database for each block update.