Regarding issue faced while running fdbbackup

When I run fdbbackup (generated binary using GitHub source code), I am facing this issue “The backup on tag `default’ was successfully submitted but no backup agents are responding”. How to resolve this issue…?

Running fdbbackup starts the backup job, but it’s the agents that are responsible for actually doing the work. You’ll need to have at least one running for backup to actually make any progress. Depending on the size of your cluster and how quickly you want backup to go, you may want to run more than that.

If you are using fdbmonitor to run your processes, these can be started by configuring them in foundationdb.conf, like so:

command = /usr/lib/foundationdb/backup_agent/backup_agent
logdir = /var/log/foundationdb

[backup_agent.1]

If you are running your processes through some other mechanism, then you just need to start the agent processes with the cluster file for your cluster. See https://apple.github.io/foundationdb/backups.html?highlight=backup#backup-agent-command-line-tool for more details.

hey @ajbeamon
how do I verify that I am running the processes using fdbmonitor. I installed the cluster using the foundationDB operator documented here.
A part from that the foundationDB documentation says, in the link that you shared, that we dont need to start backup_agent manually it usually runs automatically on the machine.

You can do ps -ef | grep fdb and the verify if fdbmonitor is running and if its pid is same as fdbserver process parent pid.

somehting like this:

ubuntu    1087     1  0 Dec12 ?        00:00:00 /usr/bin/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdb/fdbmonitor.pid --daemonize
ubuntu    1099  1087  0 Dec12 ?        00:18:54 /usr/lib/foundationdb/backup_agent/backup_agent --cluster_file /etc/foundationdb/fdb.cluster --logdir /var/log/foundationdb
ubuntu    1100  1087 10 Dec12 ?        09:29:54 /usr/bin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4500 --listen_address public --logdir /var/log/foundationdb --public_address auto:4500

Hi @gaurav
Below is the output ps -ef in my case

 ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 06:58 ?        00:00:00 sh -c fdbmonitor --conffile /var/dynamic-conf/fdbmonitor.conf --lockfile /var/fdb/fdbmonitor.lockfile
root         6     1  0 06:58 ?        00:00:00 fdbmonitor --conffile /var/dynamic-conf/fdbmonitor.conf --lockfile /var/fdb/fdbmonitor.lockfile
root         7     6  2 06:58 ?        00:06:04 /var/dynamic-conf/bin/6.2.11/fdbserver --class storage --cluster_file /var/fdb/data/fdb.cluster --datadir /var/fdb/data --knob_disable_posix_kernel_aio 1 --locality_instance_id 1 --locali
root       122     0  0 08:27 pts/0    00:00:00 bash

so it seems I have fdbmonitor running and its PID is same as the fdbserver's PPID.

To resolve it I changed the fdbmonitor.conf file to have details of backup_agent, because it seems the backup_agent is not being started by default. So insert below snipped in your fdbmonitor.conf file.

[backup_agent]
command = /usr/bin/backup_agent -C <fdb-cluster-file > 

[backup_agent.1]

in my case fdb-cluster-file is at /var/dynamic-conf/fdb.cluster. After that you dont have to restart the service because this dir is always listened and if anything is changed the processes will be loaded accordingly. After doing the mentioned changes I was able to take the backup successfully.

When you’re running FDB through the Kubernetes Operator, it doesn’t start backup agents. You can start the backup agents through a deployment and have them get the cluster file from the config map that the operator creates. We should an example of this pattern to the documentation.

We should also consider integrating backup management into the operator.

1 Like

Hey @john_brownlee
that makes sense but I was able to take backup somehow, as mentioned in my above post, but the problem now is I am getting some issues while restoring that backup. I have created a post about that, can you please look into that when you have time.

Hi @john_brownlee
I will bother you once again about this, I started experiencing the same issue once again today, and when I do the exactly same thing that I have mentioned above about changing the fdbmonitor.config file to add backup_agent section, things dont work. But this worked as expected yesterday. When I change the file with below content and same it

[backup_agent]
command = /usr/bin/backup_agent -C <fdb-cluster-file > 

[backup_agent.1]

below is the output that i get in the pod’s logs

Time="1576579785.791398" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf file /var/dynamic-conf/fdbmonitor.conf
Time="1576579785.791595" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf dir /var/dynamic-conf/ (18)
Time="1576579785.791615" Severity="10" LogGroup="default" Process="fdbmonitor": Loading configuration /var/dynamic-conf/fdbmonitor.conf
Time="1576579785.791903" Severity="10" LogGroup="default" Process="fdbmonitor": Updated configuration for fdbserver.1

Can you please suggest how to about taking the backup of the database.

In general, I would recommend running the backup agents through a Deployment rather than through fdbmonitor.

yeah, but there also even if we mount the configmap to the new backup_agent deployment and then provide the fdb cluster file using -r flag, do you think fdbbackup will be able to backup the data from the foundationdb pod, have you tested that.
One more thing, if my database is running through the operator pods and I want to run fdbbackup for the remote database, I am not on the same pod where my cluster file and other configuration are, how can we achieve that. Is there a way to provide a flag to fdbbackup so that it will take backup of the remote database.

Also as a side note, it shouldn’t be necessary to specify the cluster file in the command part of your foundationdb.conf file. If the correct cluster file is listed in the general section, I think it will automatically get picked up by the process. If not, you should be able to specify it in the backup section in the same way:

[backup_agent]
command = /usr/bin/backup_agent
cluster_file = <path_to_cluster_file>

The backup agents get their data by connecting to the cluster, rather than reading off of a disk, so they do not need to be in the same pod or on the same machine as the fdbserver processes. All you need to provide to backup_agent and fdbbackup is the path to the cluster file.

Yeah, I understood that but the problem is lets say if I run below command to run the backup_agent on the other pod

/usr/bin/backup_agent -C /var/dynamic-conf/fdb.cluster

here -C provides the mechanism to pass the cluster problem, now the confusion that I have is, where is backup_agent going to look for the file /var/dynamic-conf/fdb.cluster. Because backup_agent will not be able to find the file where it (backup_agent) is running.

You will need to mount the cluster file into the container where the backup agent is running. You should be able to mount the same config map that the operator creates for the fdbserver processes, and get the cluster file from there.

Hi @john_brownlee
I got that when we first discussed about that for the first time, but after looking into that configmap I had some doubts about how are we going to figure out where the cluster actually is. But when I looked into that file again, I think below is how backup_agent is going to figure out the cluster. Thanks

cluster-file: |
    # DO NOT EDIT!
    # This file is auto-generated, it is not to be edited by hand
    foundationdbcluster_sample:VPnLf5797TLX6qSQeACWSSdt6k7OHYGO@10.244.0.84:4500:tls,10.244.0.90:4500:tls,10.244.0.91:4500:tls
  

Can you suggest the binaries that should be there in the other pod that we are going to spin up for the backup_agent or any documentation link would help.

We have more documentation on backup in general here: https://apple.github.io/foundationdb/backups.html