Versionstamps and DR streaming

arohner · July 12, 2024, 2:43pm

When using a DR cluster, my understanding is that the DR cluster’s keys and values will not be perfectly identical, if one uses set_versionstamped_key, because the DR cluster will call set_versionstamped_key itself, and its clock is not synchronized. Is that right?

In application code, if I call set_versionstamped_key get_range (to find the key) clear(complete_versionstamp), when those operations are copied over to the DR cluster, the clear will not be effective, because the key on disk has a different complete versionstamp. Is that correct?

If so, what are my options?

jzhou · July 12, 2024, 10:44pm

No, set_versionstamped_key won’t cause difference in the DR cluster. The reason is that versionstamp operations are converted into SET mutation at the commit proxy via transformVersionstampMutation() calls. DR will copy the converted mutations to the destination cluster. Thus, the destination side applies the same mutation as the source cluster. I.e., the source and the destination cluster are identical.

miridius · July 15, 2024, 9:34am

Thanks for the answer!

Are there guarantees that versionstamps continue to be unique and absolutely ordered (i.e. no new versionstamps will overlap or be older than any that previously got copied over via DR) in various recovery scenarios:

If we failover and make the DR cluster active?
If we restore the main cluster from the DR cluster by a second fdbdr switch or similar?
If we create a new cluster from an S3 backup?

jzhou · July 17, 2024, 4:37am

The answer to 1 and 2 is Yes, and no guarantee for 3.

if fdbdr switch is issued, the destination cluster’s commit version will be bumped (foundationdb/fdbclient/DatabaseBackupAgent.actor.cpp at e89cf52f6c9a96985107492bb06029495f8f9809 · apple/foundationdb · GitHub) to be larger than the source cluster, before the switchover is done. So this will guarantee versionstamps to be monotonically increasing and unique.
A second fdbdr switch will follow the same process and a version bump. So the main cluster will have a higher version, thus providing the guarantee.
There is no guarantee for 3, because no such version bump exists for restore. Someone has to use fdbcli> advanceversion command to advance the commit version.

danm · July 17, 2024, 2:17pm

I notice you specifically call out fdbdr switch, but to our understanding in a ‘true’ disaster scenario (where the source cluster has become unavailable for some reason) you’d want to run fdbdr abort --dstonly. Would the same guarantees apply in that case as with fdbdr switch?

From my reading of the code you linked it looks like yes, but I’ve never written C++ before and it’s about 20 years since I last wrote C, so I don’t trust my own evaluation

jzhou · July 17, 2024, 5:33pm

In case of “true” disaster, the destination cluster won’t be able to cleanly get all the data from the source cluster. I.e., there could be a few seconds’ mutation data missed (if this is a concern, please consider the multi-region configuration, a different topic). When aborting the DR, the destination also does a version bump (foundationdb/fdbclient/DatabaseBackupAgent.actor.cpp at 6cd5ad2ffe1d246e695bb51d0184e74d9bac7217 · jzhou77/foundationdb · GitHub), and the comment says:

The +1 is because we want to make sure that a versionstamped operation can’t reuse
// the same version as an already-applied transaction.

So there is a “weak” guarantee here, because the missed mutations could have other versionstamp operations.

Topic		Replies	Views
Possibility of Server side operations on VersionStamp Using FoundationDB	4	1065	August 10, 2019
Key-value sizes at DR source and destination have a big difference Using FoundationDB	5	663	July 20, 2022
Versionstamp ranged deletes Development	6	226	January 30, 2024
VersionStamp uniqueness and monotonicity Using FoundationDB	11	2760	March 4, 2019
Implementing VersionStamps in bindings FoundationDB Core	24	5383	May 9, 2018

Versionstamps and DR streaming

Related topics