Fdbbackup continuous internal

Reference: foundationdb/design/backup.md at main · apple/foundationdb · GitHub.

Hi, I’m running fdbbackup with option -z (continuous) with snapshot-interval = 1 days, for example, take a snapshot 01:00 daily and store to S3.
At 10:00 AM, I need to restore from the snapshot.

  1. Assume continuous backup is running normally.
    How fdbrestore can restore the latest data change from 01:00 until 10:00 AM? How mutation logs and differential logs work? Because in data format, there is mutation logs only, I wonder how differential logs work.

  2. If something wrong and make continuous backup died at 08:00 AM. I still can restore data to 08:00 AM, right?

It would be nice if we can update the documentations.

The continuous snapshot does not snapshot the database at a specific time, rather it takes a snapshot slowly over the time interval so that every snapshot interval there will be a complete snapshot in the backup. Mutation logs are always written to the backup. You can restore the cluster to any point in time after the first snapshot has completed. Restore to some version V requires:

  • A snapshot which ended at or before V
  • Mutation logs covering the start version of the snapshot through V

To see what versions you can restore to use fdbbackup describe -d <url>. Restore will automatically find and use the correct files based on the target restore version you specify.

How mutation logs and differential logs work?

Mutation logs are differential. A mutation log is a log of mutations aka changes in version order. The backup contains mutation log files and keyspace snapshot files covering specific key ranges of the database at specific versions.

  1. If something wrong and make continuous backup died at 08:00 AM. I still can restore data to 08:00 AM, right?

Yes, provided that the backup had reached a restorable point and you did not fdbbackup expire or otherwise delete the older files you would need.

For example, if you started your backup at 01:00 with a 24 hour snapshot interval, then to restore to 08:00 on a Wednesday the restore process will use the snapshot which was created between 01:00 on Tuesday and 01:00 on Wednesday along with the mutation logs from 01:00 on Tuesday through your restore target of 08:00 on Wednesday.

The safest way to delete old data from a backup is with fdbbackup expire because you can tell it what restorable time period you want and it will refuse to delete data which would affect restorability for that period.

Hi @SteavedHams , thanks for your reply. I would like to give more details about situation that I have to delete some specific snapshots.

Previously I run continuous backup with --snapshot-interval = 1hour, then for a single date, there are 23 to 24 backup snapshots. I want to delete them to save storage but have to keep 1 snapshots for a single date (for compliance purposes). I know fdbbackup expire is safe but I can not do so because expire does not keep any snapshots.

Here is what I understand about continuous snapshot process. After finished a snapshot version x:

    1. A new `kvranges` folder (x + 1) is created and captures ongoing point-in-time subspace.
        
        Ref: https://github.com/apple/foundationdb/blob/main/fdbclient/FileBackupAgent.actor.cpp#L3968
        
    2. Mutation logs are written continuously
    3. Create snapshot manifest x + 1, reference all `kvranges` folder (x + 1):
        
        Ref: https://github.com/apple/foundationdb/blob/main/fdbclient/BackupContainerFileSystem.actor.cpp#L1214

It means an inconsistence snapshot is stored in 2 entities: 1 snapshot manifest file and exactly 1 kvranges folder. Then, to safe delete a specific snapshot, you can delete the snapshot manifest file and its corresponding kvranges. For mutation logs, I keep all from the initial snapshot.

Test procedure:

1. Setup: run continuous backup, db already wrote 2 snapshots files, 3 kvranges folder. 
2. Delete snapshot files, delete corresponding kvranges of the 2nd snapshot. Keep mutation logs.
3. Restore: success

I think this approach is safe, it would be nice if you can confirm that so that someone have the same issue can utilize this method to solve their problem.