Is the database fully offline during a key-range-based restore?

mikerhodes · January 17, 2019, 4:02pm

In the documentation for fdbrestore, it says:

It is your responsibility to ensure that no clients are accessing the database while it is being restored. During the restore process the database is in an inconsistent state, and writes that happen during the restore process might be partially or completely overwritten by restored data.

Broadly, I read this to mean that you effectively take the database offline to clients during any restore. If you are restoring a given key range, do you still need to prevent clients accessing the whole database, or is preventing read/write of that key range enough?

Cheers!

SteavedHams · January 18, 2019, 1:08am

In short, preventing writes to the key range targeted by the restore is enough to avoid interfering with / corrupting the restore. Preventing reads also makes sense to do, unless for some reason the application doesn’t care about the inconsistency of the ranges it reads.

During a restore, the destination cluster is in a “locked” state. Clients can still execute both read and write transactions against the database if they acknowledge this locked state by setting the LOCK_AWARE transaction option.

ksnavely · February 21, 2020, 6:17pm

Hello all, I wanted to resurrect this topic. We’re continuing to develop our backup and restore story at IBM Cloudant.

It’s very desirable for us to support partial keyspace restores against FDB without locking out our Cloudant layer from interactions. We have good control of the key range in question and naturally the key range needs to be clear before a restore can take place.

I wanted to re-affirm that it’s sane to use LOCK_AWARE in this scenario, to allow our layer to continue transactions outside the key range being restored. We would not intentionally read/write from the range being restored during restoration.

If we decide to integrate lock-awareness into our layer (for use only during partial key space restores) I anticipate we’ll try and lock read/writes to any key space we’re actively restoring at the layer level.

Then to perform a partial keyspace restore, we’d configure our layer to enter LOCK_AWARE ‘mode’, lock out the key space with logic in our layer, and then begin the restore activity. Afterwards, we’d disable the lock-aware “recovery mode” of our layer.

Sane?

mengxu · February 21, 2020, 9:05pm

It sounds good to me from the new restore system perspective.

The only thing in my mind, which you may have considered, is that restore can restore to multiple key-ranges (maybe in parallel). So your layer may need to be aware of that as well.

SteavedHams · February 21, 2020, 9:48pm

This is a sane thing to do given the current state of things. Should FDB locking capability evolve such that specific key ranges are lockable then a restore operation would only lock the range(s) being restored and you would no longer have to set LOCK_AWARE to work with keys outside of those range(s).

ajbeamon · February 21, 2020, 9:55pm

This is a possibly relevant issue:

ksnavely · February 21, 2020, 10:42pm

Thank you all for the additional input. You’ll here more from us as testing continues. Cheers!

agill · July 10, 2024, 5:15pm

Hello all,

Couple more questions on this please

What are the implications of leaving LOCK_AWARE mode on at all times?
Are there other FDB-induced causes for the cluster lock, other than restore?

We’re trying to determine whether we need to have the LOCK_AWARE mode
enabled only during a restore, or if it is something that we can just
leave enabled, as a matter of course. But we would also like to better understand
about other reasons for the cluster lock, other than of course, taking the lock on purpose…

Thank you in advance

Topic		Replies	Views
Restoring key range from backup Using FoundationDB	0	248	August 7, 2023
Restoring the backup that was taken through fdbbackup results in database lock Using FoundationDB	2	847	December 18, 2019
Why FoudationDB can't restore data to a non-empty database Using FoundationDB	2	982	March 9, 2020
Restore multiple ranges in a kvranges file are failed FoundationDB Core	2	315	May 24, 2023
About backup mechanism Using FoundationDB	7	1094	July 12, 2019

Is the database fully offline during a key-range-based restore?

Related topics