Is the database fully offline during a key-range-based restore?

In the documentation for fdbrestore, it says:

It is your responsibility to ensure that no clients are accessing the database while it is being restored. During the restore process the database is in an inconsistent state, and writes that happen during the restore process might be partially or completely overwritten by restored data.

Broadly, I read this to mean that you effectively take the database offline to clients during any restore. If you are restoring a given key range, do you still need to prevent clients accessing the whole database, or is preventing read/write of that key range enough?

Cheers!

In short, preventing writes to the key range targeted by the restore is enough to avoid interfering with / corrupting the restore. Preventing reads also makes sense to do, unless for some reason the application doesn’t care about the inconsistency of the ranges it reads.

During a restore, the destination cluster is in a “locked” state. Clients can still execute both read and write transactions against the database if they acknowledge this locked state by setting the LOCK_AWARE transaction option.

Hello all, I wanted to resurrect this topic. We’re continuing to develop our backup and restore story at IBM Cloudant.

It’s very desirable for us to support partial keyspace restores against FDB without locking out our Cloudant layer from interactions. We have good control of the key range in question and naturally the key range needs to be clear before a restore can take place.

I wanted to re-affirm that it’s sane to use LOCK_AWARE in this scenario, to allow our layer to continue transactions outside the key range being restored. We would not intentionally read/write from the range being restored during restoration.

If we decide to integrate lock-awareness into our layer (for use only during partial key space restores) I anticipate we’ll try and lock read/writes to any key space we’re actively restoring at the layer level.

Then to perform a partial keyspace restore, we’d configure our layer to enter LOCK_AWARE ‘mode’, lock out the key space with logic in our layer, and then begin the restore activity. Afterwards, we’d disable the lock-aware “recovery mode” of our layer.

Sane?

It sounds good to me from the new restore system perspective.

The only thing in my mind, which you may have considered, is that restore can restore to multiple key-ranges (maybe in parallel). So your layer may need to be aware of that as well.

This is a sane thing to do given the current state of things. Should FDB locking capability evolve such that specific key ranges are lockable then a restore operation would only lock the range(s) being restored and you would no longer have to set LOCK_AWARE to work with keys outside of those range(s).

This is a possibly relevant issue:

Thank you all for the additional input. You’ll here more from us as testing continues. Cheers!

Hello all,

Couple more questions on this please :slight_smile:

  1. What are the implications of leaving LOCK_AWARE mode on at all times?
  2. Are there other FDB-induced causes for the cluster lock, other than restore?

We’re trying to determine whether we need to have the LOCK_AWARE mode
enabled only during a restore, or if it is something that we can just
leave enabled, as a matter of course. But we would also like to better understand
about other reasons for the cluster lock, other than of course, taking the lock on purpose…

Thank you in advance