Backup/restore of a large cluster in a distributed way

Hi,

So I was playing around backup/restore of a cluster to/from a local file. So here is my understanding :

  • The backup agent processes are the actual daemon processes which executes the backup and restore of the cluster. When a fdbbackup client runs for backup of the cluster, the request is queued in the cluster and then it is assigned to the backup agents for the actual backup/restore to be done.

  • If multiple backup agents are running across several hosts then the backup job is distributed across those agents, therefore each job writing the partial backup to the file(path provided during fdbbackup client) in the local directory of the host on which backup agent is running(a subdirectory for the backup job with same timestamp is created in the directory path provided across all the hosts who have backup agents running on them).

  • During restore, the backup agent requires all the files for restoring(infact atleast one backup agent should have all the files as each backup agent tries a restore and if it doesn’t have all the files it will fail and some other backup agent will retry, eventually restore will succeed. Backup and restore using multiple backup_agent with local url). Therefore for backup agents across several machines, all the backup files within each host needs to be accumulated first in a single place where a backup agent can access them and eventually restore the database.

Please correct me if i am wrong. Now there are a few questions regarding the behaviour i observed.
For multiple backup agents on different machines, is the distribution of key ranges across those agents processes uniform in terms of size of the partial backup they will produce?

If I want to manually control the backup agents for example i want only a particular set of backup agent processes to execute a particular fdbbackup client job, can i do that?

After running the backup with backup agents running across several machines, I have partial backup files across those machines, so now is there a way i can restore the cluster directly using these partial backup files across the machines rather than first accumulating them in a single host?

So what our use case is that we have a huge cluster in terms of size and we want to take its backup ofcourse. now in terms of backup, spawning multiple backup agents across several machines can help us taking the partial backup on each one of those machines(assuming the distribution of key ranges across backup agents for the backup job is somewhat uniform) but during the restore, it requires all those partial backup files to be accumulated first at a single place but getting a disk with such a huge space is not feasible. so is there any way to achieve this?

Is backing up to/from S3 (or an S3-compatible blob storage system) not an option? Or could you instead do a file backup to/from some NFS instance so that all files are accessible and in one place for all backup agents?