FDBbackup does not create backup files and backup_worker_enabled flag is always zero

Hi All,

I have 5 node cluster (FBD version 7.1.5) running in triple redundancy mode where one of the node is configured with class “backup” for dedicated backup operation.
and am unable to create a backup using fdbbackup command at local directory where each node has same backup directory. I have used following fdbbackup options and none of options result in generating the backup files and don’t see any errors in log file.

fdbbackup start -C /etc/foundationdb/fdb.cluster --partitioned-log-experimental -t testtag --log --logdir /home/fdb/fdbbackuplog/ -d file:///home/fdb/foundationbkp --tls-verify-peers "Check.Valid=0" --tls-certificate-file /etc/foundationdb/pki/client-server.crt --tls-key-file /etc/foundationdb/pki/client-server.key --tls-ca-file /etc/foundationdb/pki/ca.crt -w &

fdbbackup start -C /etc/foundationdb/fdb.cluster --log --logdir /home/fdb/fdbbackuplog/ -d file:///home/fdb/foundationbkp --tls-verify-peers "Check.Valid=0" --tls-certificate-file /etc/foundationdb/pki/client-server.crt --tls-key-file /etc/foundationdb/pki/client-server.key --tls-ca-file /etc/foundationdb/pki/ca.crt -w &

Status Details:

Summary

Configuration:
Redundancy mode - triple
Storage engine - ssd-2
Coordinators - 5
Desired Commit Proxies - 6
Desired GRV Proxies - 6
Desired Resolvers - 1
Desired Logs - 2
Usable Regions - 2
Regions:
Primary -
Datacenter - DC1
Satellite datacenters - DC2, DC3
Satellite Redundancy Mode - one_satellite_triple
Remote -
Datacenter - DC3
Satellite datacenters - DC2, DC1
Satellite Redundancy Mode - one_satellite_triple

Cluster:
FoundationDB processes - 104
Zones - 13
Machines - 13
Memory availability - 7.9 GB per process on machine with least available
Fault Tolerance - 2 zones
Server time - 07/11/22 20:56:16

Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 1.985 GB
Disk space used - 25.611 GB

Operating space:
Storage server - 1805.1 GB free on most full server
Log server - 1806.6 GB free on most full server

Workload:
Read rate - 23 Hz
Write rate - 0 Hz
Transactions started - 7 Hz
Transactions committed - 0 Hz
Conflict rate - 0 Hz

Backup and DR:
Running backups - 0
Running DRs - 0

Process performance details:
10.0.0.10:4500:tls ( 4% cpu; 3% machine; 0.005 Gbps; 2% disk IO; 5.7 GB / 8.0 GB RAM )
10.0.0.10:4501:tls ( 3% cpu; 3% machine; 0.005 Gbps; 2% disk IO; 0.4 GB / 8.0 GB RAM )
10.0.0.10:4502:tls ( 4% cpu; 3% machine; 0.005 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.0.10:4503:tls ( 2% cpu; 3% machine; 0.005 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.0.10:4504:tls ( 2% cpu; 3% machine; 0.005 Gbps; 2% disk IO; 1.9 GB / 8.0 GB RAM )
10.0.0.10:4505:tls ( 2% cpu; 3% machine; 0.005 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.0.10:4506:tls ( 3% cpu; 3% machine; 0.005 Gbps; 2% disk IO; 0.4 GB / 8.0 GB RAM )
10.0.0.10:4507:tls ( 3% cpu; 3% machine; 0.005 Gbps; 2% disk IO; 1.6 GB / 8.0 GB RAM )
10.0.0.11:4500:tls ( 2% cpu; 4% machine; 0.006 Gbps; 2% disk IO; 5.1 GB / 8.0 GB RAM )
10.0.0.11:4501:tls ( 2% cpu; 4% machine; 0.006 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.0.11:4502:tls ( 2% cpu; 4% machine; 0.006 Gbps; 2% disk IO; 0.4 GB / 8.0 GB RAM )
10.0.0.11:4503:tls ( 2% cpu; 4% machine; 0.006 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.0.11:4504:tls ( 3% cpu; 4% machine; 0.006 Gbps; 2% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.0.11:4505:tls ( 10% cpu; 4% machine; 0.006 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.0.11:4506:tls ( 3% cpu; 4% machine; 0.006 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.0.11:4507:tls ( 2% cpu; 4% machine; 0.006 Gbps; 2% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.0.12:4500:tls ( 2% cpu; 2% machine; 0.003 Gbps; 2% disk IO; 0.4 GB / 8.0 GB RAM )
10.0.0.12:4501:tls ( 2% cpu; 2% machine; 0.003 Gbps; 2% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.0.12:4502:tls ( 2% cpu; 2% machine; 0.003 Gbps; 2% disk IO; 1.4 GB / 8.0 GB RAM )
10.0.0.12:4503:tls ( 3% cpu; 2% machine; 0.003 Gbps; 2% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.0.12:4504:tls ( 2% cpu; 2% machine; 0.003 Gbps; 2% disk IO; 1.6 GB / 8.0 GB RAM )
10.0.0.12:4505:tls ( 2% cpu; 2% machine; 0.003 Gbps; 2% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.0.12:4506:tls ( 2% cpu; 2% machine; 0.003 Gbps; 2% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.0.12:4507:tls ( 2% cpu; 2% machine; 0.003 Gbps; 2% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.0.13:4500:tls ( 2% cpu; 2% machine; 0.003 Gbps; 1% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.0.13:4501:tls ( 2% cpu; 2% machine; 0.003 Gbps; 1% disk IO; 1.6 GB / 8.0 GB RAM )
10.0.0.13:4502:tls ( 2% cpu; 2% machine; 0.003 Gbps; 1% disk IO; 1.6 GB / 8.0 GB RAM )
10.0.0.13:4503:tls ( 2% cpu; 2% machine; 0.003 Gbps; 1% disk IO; 1.6 GB / 8.0 GB RAM )
10.0.0.13:4504:tls ( 2% cpu; 2% machine; 0.003 Gbps; 1% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.0.13:4505:tls ( 2% cpu; 2% machine; 0.003 Gbps; 1% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.0.13:4506:tls ( 2% cpu; 2% machine; 0.003 Gbps; 1% disk IO; 5.2 GB / 8.0 GB RAM )
10.0.0.13:4507:tls ( 2% cpu; 2% machine; 0.003 Gbps; 1% disk IO; 1.6 GB / 8.0 GB RAM )
10.0.0.14:4500:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.14:4501:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.14:4502:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.14:4503:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.14:4504:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.14:4505:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.14:4506:tls ( 1% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.3 GB / 7.9 GB RAM )
10.0.0.14:4507:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.15:4500:tls ( 0% cpu; 0% machine; 0.000 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.15:4501:tls ( 0% cpu; 0% machine; 0.000 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.15:4502:tls ( 0% cpu; 0% machine; 0.000 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.15:4503:tls ( 0% cpu; 0% machine; 0.000 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.15:4504:tls ( 0% cpu; 0% machine; 0.000 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.15:4505:tls ( 0% cpu; 0% machine; 0.000 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.15:4506:tls ( 0% cpu; 0% machine; 0.000 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.0.15:4507:tls ( 0% cpu; 0% machine; 0.000 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.1.10:4500:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.1.10:4501:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.1.10:4502:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.1.10:4503:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.1.10:4504:tls ( 1% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.3 GB / 7.9 GB RAM )
10.0.1.10:4505:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.1.10:4506:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.1.10:4507:tls ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 7.9 GB RAM )
10.0.2.10:4500:tls ( 1% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.2.10:4501:tls ( 1% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 0.4 GB / 8.0 GB RAM )
10.0.2.10:4502:tls ( 1% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 1.9 GB / 8.0 GB RAM )
10.0.2.10:4503:tls ( 1% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.2.10:4504:tls ( 0% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 0.2 GB / 8.0 GB RAM )
10.0.2.10:4505:tls ( 2% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 0.4 GB / 8.0 GB RAM )
10.0.2.10:4506:tls ( 2% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.2.10:4507:tls ( 3% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 0.5 GB / 8.0 GB RAM )
10.0.2.11:4500:tls ( 1% cpu; 1% machine; 0.002 Gbps; 3% disk IO; 1.9 GB / 8.0 GB RAM )
10.0.2.11:4501:tls ( 3% cpu; 1% machine; 0.002 Gbps; 3% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.2.11:4502:tls ( 1% cpu; 1% machine; 0.002 Gbps; 3% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.2.11:4503:tls ( 0% cpu; 1% machine; 0.002 Gbps; 1% disk IO; 0.2 GB / 8.0 GB RAM )
10.0.2.11:4504:tls ( 0% cpu; 1% machine; 0.002 Gbps; 1% disk IO; 0.2 GB / 8.0 GB RAM )
10.0.2.11:4505:tls ( 1% cpu; 1% machine; 0.002 Gbps; 3% disk IO; 0.4 GB / 8.0 GB RAM )
10.0.2.11:4506:tls ( 3% cpu; 1% machine; 0.002 Gbps; 3% disk IO; 0.5 GB / 8.0 GB RAM )
10.0.2.11:4507:tls ( 1% cpu; 1% machine; 0.002 Gbps; 3% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.2.12:4500:tls ( 2% cpu; 2% machine; 0.003 Gbps; 3% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.2.12:4501:tls ( 2% cpu; 2% machine; 0.003 Gbps; 3% disk IO; 1.6 GB / 8.0 GB RAM )
10.0.2.12:4502:tls ( 2% cpu; 2% machine; 0.003 Gbps; 3% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.12:4503:tls ( 2% cpu; 2% machine; 0.003 Gbps; 3% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.12:4504:tls ( 2% cpu; 2% machine; 0.003 Gbps; 3% disk IO; 1.9 GB / 8.0 GB RAM )
10.0.2.12:4505:tls ( 2% cpu; 2% machine; 0.003 Gbps; 3% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.2.12:4506:tls ( 2% cpu; 2% machine; 0.003 Gbps; 3% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.12:4507:tls ( 3% cpu; 2% machine; 0.003 Gbps; 3% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.2.13:4500:tls ( 1% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.2.13:4501:tls ( 2% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.13:4502:tls ( 2% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.13:4503:tls ( 3% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.13:4504:tls ( 2% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.13:4505:tls ( 2% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.13:4506:tls ( 2% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.13:4507:tls ( 3% cpu; 2% machine; 0.002 Gbps; 2% disk IO; 1.8 GB / 8.0 GB RAM )
10.0.2.14:4500:tls ( 1% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 0.4 GB / 8.0 GB RAM )
10.0.2.14:4501:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.14:4502:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.14:4503:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.2 GB / 8.0 GB RAM )
10.0.2.14:4504:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.14:4505:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.2 GB / 8.0 GB RAM )
10.0.2.14:4506:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.14:4507:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.15:4500:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 0.3 GB / 8.0 GB RAM )
10.0.2.15:4501:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.15:4502:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.15:4503:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.3 GB / 8.0 GB RAM )
10.0.2.15:4504:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 3.1 GB / 8.0 GB RAM )
10.0.2.15:4505:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.6 GB / 8.0 GB RAM )
10.0.2.15:4506:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.6 GB / 8.0 GB RAM )
10.0.2.15:4507:tls ( 2% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 1.8 GB / 8.0 GB RAM )

Coordination servers:
10.0.0.11:4507:tls (reachable)
10.0.0.14:4506:tls (reachable)
10.0.2.12:4500:tls (reachable)
10.0.2.13:4503:tls (reachable)
10.0.2.15:4500:tls (reachable)

Backup Node Config:

Summary

foundationdb.conf

Configuration file for FoundationDB server processes

Full documentation is available at

Configuration — FoundationDB 7.1

[fdbmonitor]
user = foundationdb
group = foundationdb

[general]
restart_delay = 60

by default, restart_backoff = restart_delay_reset_interval = restart_delay

initial_restart_delay = 0

restart_backoff = 60

restart_delay_reset_interval = 60

cluster_file = /etc/foundationdb/fdb.cluster

delete_envvars =

kill_on_configuration_change = true

Default parameters for individual fdbserver processes

[fdbserver]
command = /usr/sbin/fdbserver
public_address = auto:$ID:tls
listen_address = public
datadir = /mount/nvme0n1/foundationdb/$ID
logdir = /var/log/foundationdb

logsize = 10MiB

maxlogssize = 100MiB

machine_id =

tls_certificate_file = /etc/foundationdb/pki/client-server.crt
tls_key_file = /etc/foundationdb/pki/client-server.key
#tls_verify_peers TLS_verify_peers FDB_TLS_VERIFY_PEERS The byte-string for the verification of peer certificates and sessions
#tls_password TLS_password FDB_TLS_PASSWORD The byte-string representing the passcode for unencrypting the private key
tls_ca_file = /etc/foundationdb/pki/ca.crt
tls_verify_peers = Check.Valid=0

locality-machineid = vm-srai-fdb-benchmark-bkp-primary-az1-4
#locality-zoneid =
#locality-data-hall =
locality-dcid = DC1

io-trust-seconds = 30

class =

memory = 8GiB

storage_memory = 1GiB

cache_memory = 2GiB

metrics_cluster =

metrics_prefix =

An individual fdbserver process with id 4500

Parameters set here override defaults from the [fdbserver] section

[fdbserver.4500]
class = backup
cache_memory = 4GiB
[fdbserver.4501]
class = backup
cache_memory = 4GiB
[fdbserver.4502]
class = backup
cache_memory = 4GiB
[fdbserver.4503]
class = backup
cache_memory = 4GiB
[fdbserver.4504]
class = backup
cache_memory = 4GiB
[fdbserver.4505]
class = backup
cache_memory = 4GiB
[fdbserver.4506]
class = backup
cache_memory = 4GiB
[fdbserver.4507]
class = backup
cache_memory = 4GiB

[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent
logdir = /var/log/foundationdb

[backup_agent.1]

Configuration:

Summary

“configuration” : {
“backup_worker_enabled” : 0,
“blob_granules_enabled” : 0,
“commit_proxies” : 6,
“coordinators_count” : 5,
“excluded_servers” : [
],
“grv_proxies” : 6,
“log_spill” : 2,
“logs” : 2,
“perpetual_storage_wiggle” : 0,
“perpetual_storage_wiggle_locality” : “0”,
“proxies” : 12,
“redundancy_mode” : “triple”,
“regions” : [
{
“datacenters” : [
{
“id” : “DC1”,
“priority” : 1
},
{
“id” : “DC2”,
“priority” : 2,
“satellite” : 1,
“satellite_logs” : 9
},
{
“id” : “DC3”,
“priority” : 1,
“satellite” : 1,
“satellite_logs” : 3
}
],
“satellite_redundancy_mode” : “one_satellite_triple”
},
{
“datacenters” : [
{
“id” : “DC3”,
“priority” : 0
},
{
“id” : “DC2”,
“priority” : 2,
“satellite” : 1,
“satellite_logs” : 9
},
{
“id” : “DC1”,
“priority” : 1,
“satellite” : 1,
“satellite_logs” : 3
}
],
“satellite_redundancy_mode” : “one_satellite_triple”
}
],
“resolvers” : 1,
“storage_engine” : “ssd-2”,
“storage_migration_type” : “disabled”,
“tenant_mode” : “disabled”,
“usable_regions” : 2
}

Notice that Status details always shows “backup_worker_enabled” : 0 and Running backups - 0 even if I run the fdbbackup command with–partitioned-log-experiment option.

Any pointers in what is missing? Any idea how to enable/configure back_workers? Any other configurations changes?

Thanks
–srai

Do you have “backup_agent” running? Basically fdbbackup start sends a transaction to the FDB, which triggers the long-running “backup_agent” do do the actual work.

@jzhou Backup agent was running however was using incorrect parameter and no errors in logs. Took some time to find the cause. Thanks for reply.