Trade-offs of -s <DURATION> / --snapshot_interval <DURATION> for continuous mode backups?

Hi All,

I’ve recently built 6.0.16 from source in order to get access to the S3/blobstore backup fixes. We’re running it in staging at the moment and it seems to be working well. Using a relatively small working set for testing (~20GB), we’ve successfully verified restoring from backups. However, I am curious about potential trade-offs using the duration switch in continuous mode with much larger working sets on restore.

I’ve read this post here Trying to understand the backup mechanism better. From the description, apparently this feature allow you to amortize the cost of backups over the specified window, with the default being 10 days, that’s great.

I’m wondering though if this will have a significant impact on mean time to rebuild during an fdbrestore. So far on small data sets the restore process has been relatively quick. Does anyone have any experience with continuous mode backups on larger working sets and if so is there is any relationship between mean time to restore and --snapshot_interval?

It really depends on your write workload. During a restore, the most recent complete snapshot must be written to the database. In addition, all mutations that occurred during that snapshot and afterwards, up to your restore point, must be replayed. The more mutations you have from this time period, the longer the restore will take.

For example, a worst-case scenario would be a database where nearly all keys are deleted or modified every day about once and the database size is relatively stable. In this case a snapshot window of 10 days and a restore operation on day 19 would result in 20 ‘databases’ worth of data being written by restore - one database-sized set of mutations for each of the 19 days plus the snapshot.

You would get a faster restore time with 12 hour snapshots, however this will also cost you more read ops on your cluster and write ops on your backup medium.

Thanks @SteavedHams! I will follow up with some numbers after we perform a bit more testing.

Hi, I’m a little bit confused about the example and about what does snapshot window mean.

There are three directories under the backup directory which are “kvranges”, “logs” and “snapshots”. I think the “kvranges” and “logs” are somehow immutable ones while the snapshots is an easy to append format. Please correct me if I’m wrong.

So the snapshot window means after 10 days the snapshots will be converted into immutable format? If I understand it correctly. Shouldn’t your example be one database-sized set of immutable format and 9 database-sized set of snapshots?

In short, no. I’ll just explain how it works.

All files written to a backup are immutable. This is because the backup folder layout is the same on a local filesystem or on an object storage service using the S3 API and the latter does not allow file appends.

There is no “conversion” of anything, at any time, during the backup process. Only points at which more immutable files are written to the backup. Those files are later deleted during expiration of backup data before a point in time or during deletion of an entire backup.

Mutation logs, in version order, are continuously written to files with the logs/ folder structure such that the folder name and filename describe the version range that the log file covers. There are about 5,000 of these written per day, and their sizes are determined by the write load of the cluster.

During a Snapshot, many small chunks of Key-Value pairs are written to the kvranges/ folder at a rate targeting the Snapshot Window as a total completion time. Each file represents a sorted, contiguous set of key-value pairs read at a consistent version. The folder name indicates which Snapshot the file is a part of, and the filename indicates what version its contents were read at.

At the end of a Snapshot, a list of mutually exclusive files covering the backup’s target key range(s) is written in JSON format to a new file in the snapshots/ directory. This snapshot “manifest” file exists because the list is not simply all of the files in the kvranges/ folder under the snapshot’s subfolder *.

To perform a restore, you need

  • A Snapshot file, which describes a Snapshot having a start version and an end version which are approximately Snapshot Window apart in time and having an end version less than or equal to your restore target version.
  • All of the kvrange files that are listed in the Snapshot file
  • All of the log files from the start version of the Snapshot file through your restore target version

So going back to the example, if your Snapshot Window is 10 days, starting at day 0, and you want to restore to a point at day 19, and your database size is holding steady at X but you modify/replace X amount of data each day, then your Snapshot file will contain a list of kvrange files describing approximately X of data and the 19 days of mutation logs you will need to restore will each contain X amount of data (the amount modified per day) and so your restore process will have to read 20 * X data.

(*) Due to the nature of working with two separate storage systems - the FDB cluster which contains backup task execution state and the backup medium where files are written to - it is fairly common for errors to cause a backup task, responsible for writing one file to logs/ or one file to kvranges/, to succeed in writing a file to the backup medium but then fail to mark the task as completed in the database. In the case of kvrange/ files this can result in files in the backup which have overlapping key ranges. This is not valid input for the restore process, so the backup process keeps a list in the database of a mutually exclusive kvrange file set and then flushes it to a JSON file in snapshots/ at the end of the snapshot.

Hi, first very thanks for your detail explaination! But I still have confusion about kvranges.

For the example, so the kvranges is written in 10 days? What kind of data is written in these 10 days? Do you mean the cluster is trying to write the initialized X data at day 0 into kvranges in 10 days? And the logs contains all the modifications from day 0 through day 19?

But if it works as it. Doesn’t the cluster also need to keep the initialized X data for ten days? After all data has been written into the kvranges, then the cluster can safely discard the old data and apply mutation logs.

And is the snapshot window only meaningful for once? Let’s say if I want to restore the data at day 21. I will still have only one snapshot file and log files of 21*X sizes?

The ranges component is essentially just reading the data from the database in chunks at various points throughout the snapshot window. The data does not all need to be read at the same version, but you won’t be able to restore a backup to a version before that of the last range copied (at least for the first snapshot). For a restore, mutations are played over the range data starting after the version of each range.

Because the ranges are being backup up at different versions, you won’t need to store this data in the database for any extra time.

Correct.

Snapshots are taken repeatedly, so after the first 10-day snapshot it would take another (this is all assuming you use continuous mode). Restoring to the 19th day requires us to use the first snapshot given that the second hasn’t completed, but restoring to the 21st day could be done with the second snapshot and all the mutations since the start of it up to the restore point.

Cool! Thanks for all! Now I think I understand it totally :slight_smile: