Delayed FDB dr?

czephyr · June 2, 2025, 3:26pm

I’m setting up a delayed FDB DR setup.
Some of the cluster replicas we have on other software are delayed, theyre very useful in case something messes with the data, imagine a mistake in dropping rows.

I understand that FDB has timestamped restore from the continuous backup, we tried it and it works, but some of our big clusters take too long to restore.

This seems not possible at the moment but sounds maybe possible to get considering the continuous backup already stores real time mutations, you’d just want your dr_agent to apply them with a delay.

SteavedHams · June 3, 2025, 9:55pm

Making a DR secondary intentionally lag behind the DR primary is not currently a feature but I don’t think it would be difficult to add.

DR mutation log propagation involves essentially two parallel operations:

Log Copy - The secondary cluster reads mutation log data from the primary cluster’s system keyspace, commits it to the secondary cluster’s system keyspace, then deletes it from the primary. Log version ranges are copied in parallel for greater throughput but the version at which all prior log data has been copied is tracked.

Log Apply - The secondary cluster reads the stored mutations from the Log Copy in version order, applies them to itself, then deletes the stored mutations.

Log Copy is what provides off-cluster log durability (on a short delay) so it would be just the Log Apply phase that you would want to delay. This sounds like a simple thing to do but the related code is rather old and uncommented so I don’t see a surgical place to make the change.

If anyone would like to take on this feature I can try to provide some code pointers.

czephyr · June 20, 2025, 4:56pm

Hi.
Thanks for the answer.

I’m not sure I’m familiar enough with the project to contribute but I can take a look.

I finally mounted a DR cluster on my setup. I tried a fdbdr switch command it worked, but I noticed the clients don’t get notified about the switch and still point their clusterfile to the old primary DB. What’s the reason for this behavior? Am I supposed to redirect them manually by updating their clusterfile?

Topic		Replies	Views
FoundationDB fdbdr Using FoundationDB	7	1375	June 13, 2019
`fdbdr` failover duration longer than expected Running FoundationDB	3	479	April 25, 2022
Continuous disaster recovery? Using FoundationDB	1	201	March 27, 2024
Confusion on fdbdr status output when DR is not caught up Using FoundationDB	1	469	October 16, 2019
DR Agent Role in Secondary Region Using FoundationDB	1	370	June 25, 2020

Delayed FDB dr?

Related topics