Trade-offs of -s <DURATION> / --snapshot_interval <DURATION> for continuous mode backups?

(Ray Jenkins) #1

Hi All,

I’ve recently built 6.0.16 from source in order to get access to the S3/blobstore backup fixes. We’re running it in staging at the moment and it seems to be working well. Using a relatively small working set for testing (~20GB), we’ve successfully verified restoring from backups. However, I am curious about potential trade-offs using the duration switch in continuous mode with much larger working sets on restore.

I’ve read this post here Trying to understand the backup mechanism better. From the description, apparently this feature allow you to amortize the cost of backups over the specified window, with the default being 10 days, that’s great.

I’m wondering though if this will have a significant impact on mean time to rebuild during an fdbrestore. So far on small data sets the restore process has been relatively quick. Does anyone have any experience with continuous mode backups on larger working sets and if so is there is any relationship between mean time to restore and --snapshot_interval?

(Steve Atherton) #2

It really depends on your write workload. During a restore, the most recent complete snapshot must be written to the database. In addition, all mutations that occurred during that snapshot and afterwards, up to your restore point, must be replayed. The more mutations you have from this time period, the longer the restore will take.

For example, a worst-case scenario would be a database where nearly all keys are deleted or modified every day about once and the database size is relatively stable. In this case a snapshot window of 10 days and a restore operation on day 19 would result in 20 ‘databases’ worth of data being written by restore - one database-sized set of mutations for each of the 19 days plus the snapshot.

You would get a faster restore time with 12 hour snapshots, however this will also cost you more read ops on your cluster and write ops on your backup medium.

(Ray Jenkins) #3

Thanks @SteavedHams! I will follow up with some numbers after we perform a bit more testing.