Fdbbackup continuous internal

Reference: foundationdb/design/backup.md at main · apple/foundationdb · GitHub.

Hi, I’m running fdbbackup with option -z (continuous) with snapshot-interval = 1 days, for example, take a snapshot 01:00 daily and store to S3.
At 10:00 AM, I need to restore from the snapshot.

  1. Assume continuous backup is running normally.
    How fdbrestore can restore the latest data change from 01:00 until 10:00 AM? How mutation logs and differential logs work? Because in data format, there is mutation logs only, I wonder how differential logs work.

  2. If something wrong and make continuous backup died at 08:00 AM. I still can restore data to 08:00 AM, right?

It would be nice if we can update the documentations.

The continuous snapshot does not snapshot the database at a specific time, rather it takes a snapshot slowly over the time interval so that every snapshot interval there will be a complete snapshot in the backup. Mutation logs are always written to the backup. You can restore the cluster to any point in time after the first snapshot has completed. Restore to some version V requires:

  • A snapshot which ended at or before V
  • Mutation logs covering the start version of the snapshot through V

To see what versions you can restore to use fdbbackup describe -d <url>. Restore will automatically find and use the correct files based on the target restore version you specify.

How mutation logs and differential logs work?

Mutation logs are differential. A mutation log is a log of mutations aka changes in version order. The backup contains mutation log files and keyspace snapshot files covering specific key ranges of the database at specific versions.

  1. If something wrong and make continuous backup died at 08:00 AM. I still can restore data to 08:00 AM, right?

Yes, provided that the backup had reached a restorable point and you did not fdbbackup expire or otherwise delete the older files you would need.

For example, if you started your backup at 01:00 with a 24 hour snapshot interval, then to restore to 08:00 on a Wednesday the restore process will use the snapshot which was created between 01:00 on Tuesday and 01:00 on Wednesday along with the mutation logs from 01:00 on Tuesday through your restore target of 08:00 on Wednesday.

The safest way to delete old data from a backup is with fdbbackup expire because you can tell it what restorable time period you want and it will refuse to delete data which would affect restorability for that period.

Hi @SteavedHams , thanks for your reply. I would like to give more details about situation that I have to delete some specific snapshots.

Previously I run continuous backup with --snapshot-interval = 1hour, then for a single date, there are 23 to 24 backup snapshots. I want to delete them to save storage but have to keep 1 snapshots for a single date (for compliance purposes). I know fdbbackup expire is safe but I can not do so because expire does not keep any snapshots.

Here is what I understand about continuous snapshot process. After finished a snapshot version x:

    1. A new `kvranges` folder (x + 1) is created and captures ongoing point-in-time subspace.
        
        Ref: https://github.com/apple/foundationdb/blob/main/fdbclient/FileBackupAgent.actor.cpp#L3968
        
    2. Mutation logs are written continuously
    3. Create snapshot manifest x + 1, reference all `kvranges` folder (x + 1):
        
        Ref: https://github.com/apple/foundationdb/blob/main/fdbclient/BackupContainerFileSystem.actor.cpp#L1214

It means an inconsistence snapshot is stored in 2 entities: 1 snapshot manifest file and exactly 1 kvranges folder. Then, to safe delete a specific snapshot, you can delete the snapshot manifest file and its corresponding kvranges. For mutation logs, I keep all from the initial snapshot.

Test procedure:

1. Setup: run continuous backup, db already wrote 2 snapshots files, 3 kvranges folder. 
2. Delete snapshot files, delete corresponding kvranges of the 2nd snapshot. Keep mutation logs.
3. Restore: success

I think this approach is safe, it would be nice if you can confirm that so that someone have the same issue can utilize this method to solve their problem.

If I understand correctly you are proposing to save space by deleting intermediate completed snapshots, meaning not the oldest completed snapshot or the most recent completed/in-progress snapshot but some or all of the snapshots in between. Since fdbbackup expire does not support this, you would have to do this externally. Yes it would be safe to do.

Some clarifications:

3. Create snapshot manifest x + 1, reference all `kvranges` folder (x + 1):

This is not quite correct - the snapshot manifest file does not necessarily reference all kvrange files in the folders for its version range. This is because spurious backup subtask failures can cause additional snapshot files to exist in the backup which are not referenced by the backup metadata. These additional files will overlap in key range with the files produced later by the final successful runs of backup subtasks.

The snapshot manifest produced at the end of the snapshot is a set of files containing mutually exclusive key ranges which together cover exactly the key range which the backup was targeting. The restore process requires the snapshot files to be mutually exclusive, so the backup snapshot process produces the manifest to fulfill this requirement.

Any kvrange files not referenced by a snapshot manifest will be cleaned up when fdbbackup expire covers the file’s version range. The expiration process deletes files based on the versions in their name, it does not use the snapshot manifest contents to decide what to delete.

To delete intermediate snapshots, I suggest you use fdbbackup describe to get the snapshot list and completed-ness for each to choose the ones you will delete.