we have a 3 node fdb cluster on 6.1.8 version .we have configured backups to run on one of the nodes to the same local file system and set up a cron job to run nightly backups and copy to s3 , we noticed that some days the backups are not being taken. when i run the same script from command line , i find the same issue. sometimes it backsup, sometimes i just see this
[root@ip-10-152-41-88 backup]# /data/fdb_backup_script.sh
Submitted and now waiting for the backup on tag `default’ to complete.
[root@ip-10-152-41-88 backup]# cd /data/
we tried a couple more times and now found that the back up is being taken in either one of the nodes each time. not the same node from which we are running the script.
You probably have a backup_agent running on each node, and thus you’re emitting backup files on the different nodes as they randomly claim responsibility for (parts of) the backup. It’d be best to just archive the backup files from all of your nodes together, so that you can get parallelism in your backup and not overload one disk with storing the whole database. If you need the backup emitted entirely in one node, then you’d want to run one or more backup_agent processes only on that one node.
so u mean if i want back up to run only on one node, i need to stop the backup agents on other nodes ? and also in that case will i have the entire data backup on that node
Your fdb_backup_script.sh is requesting that a backup should be taken, but backup_agent is what actually takes the backup, done in chunks by different backup agent processes in parallel. If you need all of the backup files to be emitted in one node, then you need to only run backup_agents on that one node.
Restored to version is the version the DB will be restored to.
Only after you run your backup agents for a while, will the backup become restorable.
You can try the describe option with the backup agent that triggers backup.
The option will output the description of the backup. The restorable version is the minimum version your DB can be restored to.
DB can restore to any version after the restorable version, assuming you are not deleting the backup.
This is correct, and also if you are using fdbbackup expire to remove older data then the change in restorable version range will be reflected in subsequent fdbbackup describe operations.
The error is saying that one of the backup_agent processes failed to open a file from the restore set at one point in time. Apparently - since the restore did succeed later - it or another backup_agent process was able to open the file eventually.
To be clear, the message indicating completion means the restore was completely successful, regardless of any errors were encountered during the restore process.
If you would like to know which file could not be accessed, look at the trace logs from your backup_agent processes, but note that by default the agents to not write trace logs so you may not be generating any.
No. It means that backups take time to finish writing enough data into the backup destination so that they are restorable. The work of backups is done by the backup agents.
i was trying to test the following scenario. taking local backups in a cluster uploading to s3. the entire backupfolder is uplaoded as a targz. then i downloaded this tar in to another cluster and tried resoring it, thats when i for those above errors. I heard from a colleague that there will be missing metadata when local backups are uploaded to s3 and later used for restore.
can u please explain this further and also when we upload to s3 we upload the complete backup with all folders tarred and zipped
Backup destinations specified with file:// and blobstore:// are not compatible with each other. Someone recently asked about writing backups to local folders and then transplanting them to S3 and restoring from there (with a blobstore:// URL, I assumed). Currently that is not possible, and there are no plans to make it possible as you can and should backup directly to S3 instead.
So to clarify, you can NOT do this:
Create a backup with a file:// URL
Upload the directory and all of its files to S3
Restore the backup using a blobstore:// URL
But what you CAN do, which is the situation you describe, is
Create a backup with a file:// URL
Upload the directory and all of its files to S3
Download the directory from S3 to some local folder
Restore the backup with a file:// URL that points to the local folder downloaded to in step 3.
In this scenario, the fact that the files were copied somewhere and back makes no difference, restore just needs to see all of the files it wrote during the backup in whatever input directory you give it.
thanks, I was trying to do some testing with backup agents … but i was unable to invoke the backup_agent command line tool
[root@ip-10-148-41-166 foundationdb]# backup_agent
bash: backup_agent: command not found
[root@ip-10-148-41-166 foundationdb]# where is backup_agent
bash: where: command not found
[root@ip-10-148-41-166 foundationdb]# which backup_agent
/usr/bin/which: no backup_agent in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/bin)
[root@ip-10-148-41-166 foundationdb]#
to stop the backup agents i already commented in conf file and restarted the fdb process. so backup agent is not running anymore, i wanted to check how to start the backup agent only on one node thru the backup_agent command line tool as mentioned in the document but unable to find that