Trying to understand the backup mechanism better

markus.pilman · July 25, 2018, 10:58pm

Hi all,

I am currently trying to understand the backup mechanism of FDB better (the way it is implemented and where in the code I can find it). And I was hoping that someone could give me some pointers.

My current high-level understanding is the following:

When a backup starts, the client will that kicks off the backup writes a few keys into the \xff keyspace. But honestly I am not even sure what exactly it writes into it
Then, backupagent is picking this up and executes range-queries against the normal key space.
Simultaneously, all new mutations are written into another special keyspace (I think \xff/bklog is used for that)
when BackupAgent is done reading the whole key range it also reads the bklog to fetch new mutations that occurred during this time.

However, I can’t really figure out where in the code this is happening? I would assume that the proxies are adding the Mutations to the bklog somewhere, but I can’t really find the code that is doing that. Also is there something else I am missing?

Furthermore I am not quite sure I understand how this TaskBucket thingy works that backupagent is using.

Any pointers to code or links to docs I missed would be greatly appreciated!

Thanks Markus

alexmiller · July 27, 2018, 8:14pm

@SteavedHams is going to have the best explanation for you here.

mbhaskar · July 31, 2018, 5:55am

There is some explanation on design here

SteavedHams · August 4, 2018, 4:16am

Note that the design document @mbhaskar linked is a bit out of date in that continuous backup is now implemented and snapshots are continuous but at a rate based on a user-chosen snapshot interval (except for the first one which is full speed). This serves to normalize the bandwidth and workload effects of constantly running backups.

Backups Agents use TaskBucket to divided the work of a backup into many small Tasks and execute them transactionally. When a backup is created, a single Task is created which represents the start of the backup. When that Task is executed by an agent, it creates more Tasks, and those create more, etc until all of the work is done. The new Tasks created can be set to run in parallel, at an (approximate) scheduled time, or serially in response to the completion of some other specific Task.

TaskBucket state is stored in \xff, and \xff is not included as part of a backup.

The initial backup Task, as part of its completion transaction, sets a special key in \xff which tells proxies to start logging mutations for a specific key range to another place in \xff. This is initiated in StartFullBackupTaskFunc::_finish().

The proxies react to the new configuration key in \xff indicating mutations saving in this block which is where various \xff activity is reacted to.

The proxies insert the committed (after conflict resolution) mutations into the configured destination keyspace(s) here in commitBatch().

markus.pilman · August 24, 2018, 11:45pm

Thank you so much for this description! This is super helpful! I somehow didn’t find the design document by myself and this together with the description in this thread helps a lot in understanding the code!

Topic		Replies	Views
Backup & restore performance tuning Using FoundationDB performance	17	2880	May 6, 2020
Backup in a file system Using FoundationDB	0	397	April 26, 2021
Design and Implementation of a Performant Restore System in FDB Development	23	3164	December 8, 2020
Increased system keyspace size after backups finish Using FoundationDB	9	814	November 1, 2022
Backup /restore fdb Using FoundationDB	21	2718	October 5, 2019

Trying to understand the backup mechanism better

Related topics