Backup/restore of a large cluster in a distributed way

shubham4060 · January 13, 2021, 6:02am

Hi,

So I was playing around backup/restore of a cluster to/from a local file. So here is my understanding :

The backup agent processes are the actual daemon processes which executes the backup and restore of the cluster. When a fdbbackup client runs for backup of the cluster, the request is queued in the cluster and then it is assigned to the backup agents for the actual backup/restore to be done.
If multiple backup agents are running across several hosts then the backup job is distributed across those agents, therefore each job writing the partial backup to the file(path provided during fdbbackup client) in the local directory of the host on which backup agent is running(a subdirectory for the backup job with same timestamp is created in the directory path provided across all the hosts who have backup agents running on them).
During restore, the backup agent requires all the files for restoring(infact atleast one backup agent should have all the files as each backup agent tries a restore and if it doesn’t have all the files it will fail and some other backup agent will retry, eventually restore will succeed. Backup and restore using multiple backup_agent with local url). Therefore for backup agents across several machines, all the backup files within each host needs to be accumulated first in a single place where a backup agent can access them and eventually restore the database.

Please correct me if i am wrong. Now there are a few questions regarding the behaviour i observed.
For multiple backup agents on different machines, is the distribution of key ranges across those agents processes uniform in terms of size of the partial backup they will produce?

If I want to manually control the backup agents for example i want only a particular set of backup agent processes to execute a particular fdbbackup client job, can i do that?

After running the backup with backup agents running across several machines, I have partial backup files across those machines, so now is there a way i can restore the cluster directly using these partial backup files across the machines rather than first accumulating them in a single host?

So what our use case is that we have a huge cluster in terms of size and we want to take its backup ofcourse. now in terms of backup, spawning multiple backup agents across several machines can help us taking the partial backup on each one of those machines(assuming the distribution of key ranges across backup agents for the backup job is somewhat uniform) but during the restore, it requires all those partial backup files to be accumulated first at a single place but getting a disk with such a huge space is not feasible. so is there any way to achieve this?

alexmiller · January 13, 2021, 8:28pm

Is backing up to/from S3 (or an S3-compatible blob storage system) not an option? Or could you instead do a file backup to/from some NFS instance so that all files are accessible and in one place for all backup agents?

Topic		Replies	Views
Backup and restore using multiple backup_agent with local url Using FoundationDB	3	659	May 11, 2020
Backup /restore fdb Using FoundationDB	21	2789	October 5, 2019
Backup & restore performance tuning Using FoundationDB performance	17	2968	May 6, 2020
Taking FDB cluster backup in multiple disks Using FoundationDB	2	486	August 4, 2020
Regarding issue faced while running fdbbackup Using FoundationDB	16	2068	December 17, 2019

Backup/restore of a large cluster in a distributed way

Related topics