"backup failure: Could not create backup container: Operation timed out" with version 6.3.9

I am using a Ubuntu 18.04 vm created by Vagrant and then deploying foundationdb-clients_6.3.9-1_amd64.deb and foundationdb-server_6.3.9-1_amd64.deb. I have not configured TLS on foundationdb.

After deployment , I used fdbcli to enter a few key values

This is the command I used for starting a backup:

fdbbackup start -d blobstore://key:secret@s3.us-east-1.amazonaws.com/fb1?bucket=foundationdbbackups --log -t test1
console output:

ERROR: Could not create backup container: Operation timed out
ERROR: An error was encountered during submission
Fatal Error: Backup error

Trace log

<?xml version="1.0"?>
<Trace>
<Event Severity="10" Time="1621247820.009499" Type="Net2Starting" ID="0000000000000000" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.010484" Type="ProgramStart" ID="0000000000000000" SourceVersion="cb52535f4699d30beceb7ad6220ab8d7299da946" Version="6.3.9" PackageName="6.3" ActualTime="1621247820" CommandLine="fdbbackup start -d blobstore://SECRETKEY:SECRETPASSWORD@s3.us-east-1.amazonaws.com/fb1?bucket=foundationdbbackups --log -t test1" MemoryLimit="8589934592" TrackLatestType="Original" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.009563" Type="ClientStart" ID="0000000000000000" SourceVersion="cb52535f4699d30beceb7ad6220ab8d7299da946" Version="6.3.9" PackageName="6.3" ClusterFile="/etc/foundationdb/fdb.cluster" ConnectionString="myoWx5ce:6fcOBCfl@192.168.49.3:4500" ActualTime="1621247820" ApiVersion="-1" ImageOffset="(nil)" Machine="192.168.49.3:6423" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1621247820.009563" Type="MachineLoadDetail" ID="0000000000000000" User="2106" Nice="246" System="1678" Idle="349710" IOWait="300" IRQ="0" SoftIRQ="199" Steal="0" Guest="0" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.009563" Type="Net2TLSConfig" ID="0000000000000000" CAPath="" CertificatePath="" KeyPath="" HasPassword="0" VerifyPeers="Check.Valid=1" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.009563" Type="NotifyAddressHealthy" ID="0000000000000000" SuppressedEventCount="0" Address="192.168.49.3:4500" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.009563" Type="FBA_SubmitBackup" ID="0000000000000000" TagName="test1" StopWhenDone="1" UsePartitionedLog="0" OutContainer="blobstore://SECRETKEY:SECRETPASSWORD@s3.us-east-1.amazonaws.com/fb1?bucket=foundationdbbackups" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.009563" Type="CodeCoverage" ID="0000000000000000" File="/opt/foundation/__sourcedir_this_is_some_very_long_name_dir__/foundationdb/fdbclient/ReadYourWrites.actor.cpp" Line="1227" Condition="true" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.009563" Type="StartingRunLoopProfilingThread" ID="0000000000000000" Interval="0.125" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.009563" Type="Net2Running" ID="0000000000000000" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.009563" Type="ProcessTimeOffset" ID="0000000000000000" ProcessTime="1621247820.018830" SystemTime="1621247820.018831" OffsetFromSystemTime="-0.000001" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.018920" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500" PeerReferences="2" FailureStatus="OK" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.018920" Type="ConnectionExchangingConnectPacket" ID="83b2df1f024008bc" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.020564" Type="ConnectionEstablished" ID="83b2df1f024008bc" SuppressedEventCount="0" Peer="192.168.49.3:4500" ConnectionId="1" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.020564" Type="ConnectedOutgoing" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247820.169923" Type="TLSPolicyFailure" ID="0000000000000000" SuppressedEventCount="0" Reason="preverification failed" VerifyError="unable to get local issuer certificate" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247820.169923" Type="N2_ConnectHandshakeError" ID="3371275772f14da4" SuppressedEventCount="0" ErrorCode="337047686" Message="certificate verify failed" WhichMeans="error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247820.169923" Type="BlobStoreEndpointRequestFailedRetryable" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" ConnectionEstablished="0" RemoteHost="s3.us-east-1.amazonaws.com" Verb="HEAD" Resource="/foundationdbbackups" ThisTry="1" RetryDelay="2" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247820.513104" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621247820.394574" Trace="addr2line -e fdbbackup.debug -p -C -f -i 0x7f12da842980 0x7f12da83f3b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f12d9ebebf7 0x4d5905" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247820.763396" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621247820.646061" Trace="addr2line -e fdbbackup.debug -p -C -f -i 0x7f12da842980 0x7f12da83f3b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f12d9ebebf7 0x4d5905" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247820.927749" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621247820.897105" Trace="addr2line -e fdbbackup.debug -p -C -f -i 0x7f12da842980 0x7f12da83f3b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f12d9ebebf7 0x4d5905" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247821.267976" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621247821.148502" Trace="addr2line -e fdbbackup.debug -p -C -f -i 0x7f12da842980 0x7f12da83f3b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f12d9ebebf7 0x4d5905" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247821.518624" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621247821.399256" Trace="addr2line -e fdbbackup.debug -p -C -f -i 0x7f12da842980 0x7f12da83f3b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f12d9ebebf7 0x4d5905" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247821.769566" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621247821.650311" Trace="addr2line -e fdbbackup.debug -p -C -f -i 0x7f12da842980 0x7f12da83f3b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f12d9ebebf7 0x4d5905" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247822.170152" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621247822.152497" Trace="addr2line -e fdbbackup.debug -p -C -f -i 0x7f12da842980 0x7f12da83f3b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f12d9ebebf7 0x4d5905" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="10" Time="1621247822.298873" Type="TLSPolicyFailure" ID="0000000000000000" SuppressedEventCount="0" Reason="preverification failed" VerifyError="unable to get local issuer certificate" Machine="192.168.49.3:6423" LogGroup="default" />
<Event Severity="20" Time="1621247822.298873" Type="N2_ConnectHandshakeError" ID="f69cac0eda141063" SuppressedEventCount="0" ErrorCode="337047686" Message="certificate verify failed" WhichMeans="error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed" Machine="192.168.49.3:6423" LogGroup="default" />

Another quirk I discovered was that is I append “&sc=0” like this

fdbbackup start -d blobstore://key:secret@s3.us-east-1.amazonaws.com/fb1?bucket=foundationdbbackups&sc=0 --log -t test1

output:
--log: command not found

Pleae note I also tried backup with version 6.3.12 and got identical failure.

Does TLS have to be enabled for backup/restore to blobstore ? can you please share a working configuration that I can try to replicate.

Can you confirm if /etc/foundationdb/foundationdb.conf is pointing to a backup_agent executable that actually exists? There is a bug on the release-6.3 branch (fixed by Fix several package issues (release-6.3) by sfc-gh-anoyes · Pull Request #4810 · apple/foundationdb · GitHub) that puts backup_agent in the wrong place when you install the deb file. Basically what we should see is that if your foundationdb.conf has a section like this:

[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent

Then /usr/lib/foundationdb/backup_agent/backup_agent should exist and be the backup_agent executable

Hi Andrew
I just verfied that /usr/lib/foundationdb/backup_agent/backup_agent does exist and it is executable

vagrant@ubuntu1804:~/op$ ls -all /usr/lib/foundationdb/backup_agent/backup_agent
lrwxrwxrwx 1 root root 22 Oct  7  2020 /usr/lib/foundationdb/backup_agent/backup_agent -> ../../../bin/fdbbackup

The & is interpreted by bash as if you wrote

$ fdbbackup start -d blobstore://key:secret@s3.us-east-1.amazonaws.com/fb1?bucket=foundationdbbackups &
$ sc=0 --log -t test1

sc=0 is then setting an environment variable before running the command --log, hence the error that you’re seeing.

You need to put quotes around the blobstore:// URL to make bash not do this.

Thank you ! That fixed it.

Note to all who may find this thread: I was trying with FoundationDB ver 6.3.9 on Ubuntu 18.04 with AWS S3 . I have not enabled TLS on this.

This command worked:
fdbbackup start -d “blobstore://key:secret@s3.us-east-1.amazonaws.com/fb1?bucket=foundationdbbackups&sc=0” --log -t test1

Is there a way to download backup_agent binary separately? I tried to search for ‘backup_agent’ locally and couldn’t find anything. We have 6.3.13 and it’s impossible to download a newer version as the website is down for around a week already.

Nevermind. ‘backup_agent’ is just a symlink to fdbbackup. The same for fdbrestore which also doesn’t exist.