I’m setting up a delayed FDB DR setup.
Some of the cluster replicas we have on other software are delayed, theyre very useful in case something messes with the data, imagine a mistake in dropping rows.
I understand that FDB has timestamped restore from the continuous backup, we tried it and it works, but some of our big clusters take too long to restore.
This seems not possible at the moment but sounds maybe possible to get considering the continuous backup already stores real time mutations, you’d just want your dr_agent
to apply them with a delay.
Making a DR secondary intentionally lag behind the DR primary is not currently a feature but I don’t think it would be difficult to add.
DR mutation log propagation involves essentially two parallel operations:
Log Copy - The secondary cluster reads mutation log data from the primary cluster’s system keyspace, commits it to the secondary cluster’s system keyspace, then deletes it from the primary. Log version ranges are copied in parallel for greater throughput but the version at which all prior log data has been copied is tracked.
Log Apply - The secondary cluster reads the stored mutations from the Log Copy in version order, applies them to itself, then deletes the stored mutations.
Log Copy is what provides off-cluster log durability (on a short delay) so it would be just the Log Apply phase that you would want to delay. This sounds like a simple thing to do but the related code is rather old and uncommented so I don’t see a surgical place to make the change.
If anyone would like to take on this feature I can try to provide some code pointers.