Backup and restore using multiple backup_agent with local url

ppggff · May 9, 2020, 8:58am

When using multiple backup_agent with local url on different nodes, backup file may be distributed in different nodes (same path).

So, does combining the backup files of all the nodes together make a complete backup?

Then, does this complete backup need to be copied to each node at the time of recovery?
(maybe only one full backup is needed, other nodes are part of the file at the time of backup?)

Is there any way to use these backup files for recovery in a distributed manner?
(do not put them together, and copy to each node)

SteavedHams · May 9, 2020, 9:13am

The idea is that backups should either go to an S3 style blob service or to a remote file system (such as NFS) that is locally mounted at the same path on each host running a backup_agent so you can use a local path such as file:///mount/some_filesystem/backup123 to read and write it.

If you do choose to write to a path which is local-only on each host, then yes if you combine all of the files together into the same folder (preserving their paths within the backup folder) then you will have a complete backup and every backup_agent will need access to this folder during a restore.

ppggff · May 11, 2020, 2:04am

Thanks.
I tried to restore using this backup and the current state is: only the node executing fdbrestore has a full backup, other nodes only have part of the backup and then the restore succeeds. Does this mean that every backup_agent does not necessarily need a full backup?

SteavedHams · May 11, 2020, 3:10am

Well, now that you mention it…Yes! It does mean that not all of the agents need all of the files, strictly speaking, and the restore will succeed in a somewhat silly way that I had not considered.

As long as you start the restore from a host with all of the files, fdbrestore will place the initial restore task info into the database. From there, there is one more task that needs access to all of the files to function. That task will run on a random backup agent, and if that backup agent cannot see all of the files then it will fail, and retry again on another random backup agent. Eventually it will run on the host that can see all the files and will complete.

After that first task, there will be many restore tasks, each doing a part of the restore in parallel. Each restore task points to a specific file to read, by its file path. The tasks are assigned to random agents as well, and if a particular host does not have the file needed by a particular task then it will fail and be randomly retried on another backup agent. This will continue until it runs on a backup agent that does have the file at which point it will succeed.

So yes, it turns out that technically all of the backup agents do not need access to all of the files, one will suffice, there will just be a lot of random retries until all of the tasks find what they’re looking for.

Thanks for trying this, I found it quite amusing!

Topic		Replies	Views
Backup/restore of a large cluster in a distributed way Using FoundationDB	1	529	January 13, 2021
Backup_agent configuration for running fdbrestore Using FoundationDB	4	797	September 2, 2021
Taking FDB cluster backup in multiple disks Using FoundationDB	2	471	August 4, 2020
Proper setup of backups Using FoundationDB	0	369	April 29, 2020
Backup /restore fdb Using FoundationDB	21	2706	October 5, 2019

Backup and restore using multiple backup_agent with local url

Related topics