Trying to understand the backup mechanism better

Hi all,

I am currently trying to understand the backup mechanism of FDB better (the way it is implemented and where in the code I can find it). And I was hoping that someone could give me some pointers.

My current high-level understanding is the following:

  • When a backup starts, the client will that kicks off the backup writes a few keys into the \xff keyspace. But honestly I am not even sure what exactly it writes into it
  • Then, backupagent is picking this up and executes range-queries against the normal key space.
  • Simultaneously, all new mutations are written into another special keyspace (I think \xff/bklog is used for that)
  • when BackupAgent is done reading the whole key range it also reads the bklog to fetch new mutations that occurred during this time.

However, I can’t really figure out where in the code this is happening? I would assume that the proxies are adding the Mutations to the bklog somewhere, but I can’t really find the code that is doing that. Also is there something else I am missing?

Furthermore I am not quite sure I understand how this TaskBucket thingy works that backupagent is using.

Any pointers to code or links to docs I missed would be greatly appreciated!

Thanks Markus

@SteavedHams is going to have the best explanation for you here.

There is some explanation on design here

Note that the design document @mbhaskar linked is a bit out of date in that continuous backup is now implemented and snapshots are continuous but at a rate based on a user-chosen snapshot interval (except for the first one which is full speed). This serves to normalize the bandwidth and workload effects of constantly running backups.

Backups Agents use TaskBucket to divided the work of a backup into many small Tasks and execute them transactionally. When a backup is created, a single Task is created which represents the start of the backup. When that Task is executed by an agent, it creates more Tasks, and those create more, etc until all of the work is done. The new Tasks created can be set to run in parallel, at an (approximate) scheduled time, or serially in response to the completion of some other specific Task.

TaskBucket state is stored in \xff, and \xff is not included as part of a backup.

The initial backup Task, as part of its completion transaction, sets a special key in \xff which tells proxies to start logging mutations for a specific key range to another place in \xff. This is initiated in StartFullBackupTaskFunc::_finish().

The proxies react to the new configuration key in \xff indicating mutations saving in this block which is where various \xff activity is reacted to.

The proxies insert the committed (after conflict resolution) mutations into the configured destination keyspace(s) here in commitBatch().

Thank you so much for this description! This is super helpful! I somehow didn’t find the design document by myself and this together with the description in this thread helps a lot in understanding the code!