Versionstamps and DR streaming

When using a DR cluster, my understanding is that the DR cluster’s keys and values will not be perfectly identical, if one uses set_versionstamped_key, because the DR cluster will call set_versionstamped_key itself, and its clock is not synchronized. Is that right?

In application code, if I call set_versionstamped_key get_range (to find the key) clear(complete_versionstamp), when those operations are copied over to the DR cluster, the clear will not be effective, because the key on disk has a different complete versionstamp. Is that correct?

If so, what are my options?

1 Like

No, set_versionstamped_key won’t cause difference in the DR cluster. The reason is that versionstamp operations are converted into SET mutation at the commit proxy via transformVersionstampMutation() calls. DR will copy the converted mutations to the destination cluster. Thus, the destination side applies the same mutation as the source cluster. I.e., the source and the destination cluster are identical.

4 Likes

Thanks for the answer!

Are there guarantees that versionstamps continue to be unique and absolutely ordered (i.e. no new versionstamps will overlap or be older than any that previously got copied over via DR) in various recovery scenarios:

  1. If we failover and make the DR cluster active?
  2. If we restore the main cluster from the DR cluster by a second fdbdr switch or similar?
  3. If we create a new cluster from an S3 backup?

The answer to 1 and 2 is Yes, and no guarantee for 3.

  1. if fdbdr switch is issued, the destination cluster’s commit version will be bumped (foundationdb/fdbclient/DatabaseBackupAgent.actor.cpp at e89cf52f6c9a96985107492bb06029495f8f9809 · apple/foundationdb · GitHub) to be larger than the source cluster, before the switchover is done. So this will guarantee versionstamps to be monotonically increasing and unique.
  2. A second fdbdr switch will follow the same process and a version bump. So the main cluster will have a higher version, thus providing the guarantee.
  3. There is no guarantee for 3, because no such version bump exists for restore. Someone has to use fdbcli> advanceversion command to advance the commit version.
1 Like

I notice you specifically call out fdbdr switch, but to our understanding in a ‘true’ disaster scenario (where the source cluster has become unavailable for some reason) you’d want to run fdbdr abort --dstonly. Would the same guarantees apply in that case as with fdbdr switch?

From my reading of the code you linked it looks like yes, but I’ve never written C++ before and it’s about 20 years since I last wrote C, so I don’t trust my own evaluation :wink:

In case of “true” disaster, the destination cluster won’t be able to cleanly get all the data from the source cluster. I.e., there could be a few seconds’ mutation data missed (if this is a concern, please consider the multi-region configuration, a different topic). When aborting the DR, the destination also does a version bump (foundationdb/fdbclient/DatabaseBackupAgent.actor.cpp at 6cd5ad2ffe1d246e695bb51d0184e74d9bac7217 · jzhou77/foundationdb · GitHub), and the comment says:

The +1 is because we want to make sure that a versionstamped operation can’t reuse
// the same version as an already-applied transaction.

So there is a “weak” guarantee here, because the missed mutations could have other versionstamp operations.