Backup /restore fdb

Minaxi · September 19, 2019, 5:04pm

we have a 3 node fdb cluster on 6.1.8 version .we have configured backups to run on one of the nodes to the same local file system and set up a cron job to run nightly backups and copy to s3 , we noticed that some days the backups are not being taken. when i run the same script from command line , i find the same issue. sometimes it backsup, sometimes i just see this
[root@ip-10-152-41-88 backup]# /data/fdb_backup_script.sh
Submitted and now waiting for the backup on tag `default’ to complete.
[root@ip-10-152-41-88 backup]# cd /data/

and it comes out to prompt and no backup

any idea on what is happening here ?

alexmiller · September 19, 2019, 5:50pm

Are you seeing that some days the backup is correctly being taken?

Minaxi · September 19, 2019, 7:43pm

we tried a couple more times and now found that the back up is being taken in either one of the nodes each time. not the same node from which we are running the script.

Minaxi · September 19, 2019, 9:11pm

on another instance , when we started backup from one node, we saw that it was created on other 2 nodes with different KV ranges.

2nd node

data/backup/fdb/backup-2019-09-19-20-47-19.201742/kvranges/snapshot.000024439621753268/0

total 3288
-rw-r–r--. 1 foundationdb foundationdb 1216640 Sep 19 20:47 range,24439621837969,8a1aefc523fb2abc0a3b2fd577ba4a64,1048576
-rw-r–r--. 1 foundationdb foundationdb 555264 Sep 19 20:47 range,24439621869660,fea39762bd513c2201554085620b905f,1048576
-rw-r–r--. 1 foundationdb foundationdb 1162069 Sep 19 20:47 range,24439621869660,3ccc375de709992ba2d7703cd857228a,1048576
-rw-r–r--. 1 foundationdb foundationdb 425289 Sep 19 20:47 range,24439621890133,403b4447d6778f7d0616634c1b0d36a6,1048576

3nd node

/data/backup/fdb/backup-2019-09-19-20-47-19.201742/kvranges/snapshot.000024439621753268/0
[root@ip-10-152-42-103 0]# ls -ltr
total 3328
-rw-r–r--. 1 foundationdb foundationdb 341880 Sep 19 20:47 range,24439621839159,a16530ad88b1024de7c74a9b5315bb5b,1048576
-rw-r–r--. 1 foundationdb foundationdb 562432 Sep 19 20:47 range,24439621859742,8bcd6f38fae68080920e188435836280,1048576
-rw-r–r--. 1 foundationdb foundationdb 499468 Sep 19 20:47 range,24439621892688,e39df635cc2bfa0784a1e221a6a6f4fc,1048576
-rw-r–r--. 1 foundationdb foundationdb 1996036 Sep 19 20:47 range,24439621859742,a3b53c7b6b036eb941cfe773de2270b7,1048576

alexmiller · September 19, 2019, 9:35pm

You probably have a backup_agent running on each node, and thus you’re emitting backup files on the different nodes as they randomly claim responsibility for (parts of) the backup. It’d be best to just archive the backup files from all of your nodes together, so that you can get parallelism in your backup and not overload one disk with storing the whole database. If you need the backup emitted entirely in one node, then you’d want to run one or more backup_agent processes only on that one node.

Minaxi · September 19, 2019, 9:37pm

so u mean if i want back up to run only on one node, i need to stop the backup agents on other nodes ? and also in that case will i have the entire data backup on that node

alexmiller · September 19, 2019, 9:40pm

Yes and yes.

Your fdb_backup_script.sh is requesting that a backup should be taken, but backup_agent is what actually takes the backup, done in chunks by different backup agent processes in parallel. If you need all of the backup files to be emitted in one node, then you need to only run backup_agents on that one node.

Minaxi · September 19, 2019, 10:55pm

ok thanks will try that,

Also i have some questions with restore. we take nightly backups and store in s3. are each backups a full backup and restorable? or is it incremental.

when i tried restoring from one backup which i have taken in the night. i see this errors
then last i also see 'Restored to version …

so that means has it restore completely or is there anything missing?

Tag: default UID: 7e3d4ff576a75bde06a30d9ca53bdd7e State: running Blocks: 10/10 BlocksInProgress: 0 Files: 7 BytesWritten: 5749811 ApplyVersionLag: 1341944 LastError: ‘‘File not found’ on ‘restore_range_data’’ 59s ago.
Tag: default UID: 7e3d4ff576a75bde06a30d9ca53bdd7e State: completed Blocks: 10/10 BlocksInProgress: 0 Files: 7 BytesWritten: 5749811 ApplyVersionLag: 0 LastError: ‘‘File not found’ on ‘restore_range_data’’ 60s ago.
Restored to version 24177538741943

Minaxi · September 23, 2019, 5:40pm

any suggestions on the restore error ?

mengxu · September 24, 2019, 8:44pm

Restored to version is the version the DB will be restored to.
Only after you run your backup agents for a while, will the backup become restorable.
You can try the describe option with the backup agent that triggers backup.
The option will output the description of the backup. The restorable version is the minimum version your DB can be restored to.
DB can restore to any version after the restorable version, assuming you are not deleting the backup.

@SteavedHams Correct me if I’m wrong.

SteavedHams · September 24, 2019, 8:48pm

This is correct, and also if you are using fdbbackup expire to remove older data then the change in restorable version range will be reflected in subsequent fdbbackup describe operations.

@Minaxi Please see https://apple.github.io/foundationdb/backups.html for more details on how backup/restore and their tools work.

SteavedHams · September 24, 2019, 8:51pm

The error is saying that one of the backup_agent processes failed to open a file from the restore set at one point in time. Apparently - since the restore did succeed later - it or another backup_agent process was able to open the file eventually.

SteavedHams · September 24, 2019, 8:52pm

To be clear, the message indicating completion means the restore was completely successful, regardless of any errors were encountered during the restore process.

If you would like to know which file could not be accessed, look at the trace logs from your backup_agent processes, but note that by default the agents to not write trace logs so you may not be generating any.

Minaxi · September 25, 2019, 3:18am

ohk, good to know , that helps, thanks all

Minaxi · September 27, 2019, 7:40pm

just one question, what do u mean by “Only after you run your backup agents for a while, will the backup become restorable.” @mengxu.

does it mean that after we take many backups ?

SteavedHams · September 27, 2019, 7:46pm

No. It means that backups take time to finish writing enough data into the backup destination so that they are restorable. The work of backups is done by the backup agents.

Minaxi · September 30, 2019, 8:59pm

i was trying to test the following scenario. taking local backups in a cluster uploading to s3. the entire backupfolder is uplaoded as a targz. then i downloaded this tar in to another cluster and tried resoring it, thats when i for those above errors. I heard from a colleague that there will be missing metadata when local backups are uploaded to s3 and later used for restore.

can u please explain this further and also when we upload to s3 we upload the complete backup with all folders tarred and zipped

SteavedHams · September 30, 2019, 9:39pm

Backup destinations specified with file:// and blobstore:// are not compatible with each other. Someone recently asked about writing backups to local folders and then transplanting them to S3 and restoring from there (with a blobstore:// URL, I assumed). Currently that is not possible, and there are no plans to make it possible as you can and should backup directly to S3 instead.

So to clarify, you can NOT do this:

Create a backup with a file:// URL
Upload the directory and all of its files to S3
Restore the backup using a blobstore:// URL

But what you CAN do, which is the situation you describe, is

Create a backup with a file:// URL
Upload the directory and all of its files to S3
Download the directory from S3 to some local folder
Restore the backup with a file:// URL that points to the local folder downloaded to in step 3.

In this scenario, the fact that the files were copied somewhere and back makes no difference, restore just needs to see all of the files it wrote during the backup in whatever input directory you give it.

Minaxi · October 2, 2019, 9:52pm

thanks, I was trying to do some testing with backup agents … but i was unable to invoke the backup_agent command line tool
[root@ip-10-148-41-166 foundationdb]# backup_agent

bash: backup_agent: command not found

[root@ip-10-148-41-166 foundationdb]# where is backup_agent

bash: where: command not found

[root@ip-10-148-41-166 foundationdb]# which backup_agent

/usr/bin/which: no backup_agent in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/bin)

[root@ip-10-148-41-166 foundationdb]#

to stop the backup agents i already commented in conf file and restarted the fdb process. so backup agent is not running anymore, i wanted to check how to start the backup agent only on one node thru the backup_agent command line tool as mentioned in the document but unable to find that

alexmiller · October 2, 2019, 10:16pm

Add /usr/local/foundationdb/ to $PATH.

Topic		Replies	Views
Backup & restore performance tuning Using FoundationDB performance	17	2968	May 6, 2020
Regarding issue faced while running fdbbackup Using FoundationDB	16	2068	December 17, 2019
Trying to restore a backup from S3 Using FoundationDB	2	691	January 5, 2021
Restoring a completed backup version results in an error Using FoundationDB	3	1036	December 17, 2019
Backup/restore of a large cluster in a distributed way Using FoundationDB	1	541	January 13, 2021

Backup /restore fdb

2nd node

3nd node

Related topics