Fatal Error: Backup error

Hi Everyone,

I’m new to FDB and am learning what I can as I go. I have FDB running on a cluster of 6 production servers and am attempting to run the fdbbackup cmd. My cmd and output are below. Thanks in advance for any help/insight - I’ve read other posts seemingly related to this issue, but with no solution. Apologies, if this is a double post.

Cmd:

$ fdbbackup start --log -d blobstore://<access>:<secret>@<spacesname>.nyc3.digitaloceanspaces.com:443/firsttry?bucket=fdbbucket

Error:

ERROR: Could not create backup container: Operation timed out
ERROR: An error was encountered during submission
Fatal Error: Backup error

.xml Log, parts that have error listed in entries:

<Event Severity="40" Time="1586813070.500949" Type="TLSOptionsSetCertFileError" ID="0000000000000000" Filename="/etc/foundationdb/fdb.pem" Error="file_not_readable" ErrorDescription="File could not be read" ErrorCode="1513" UnixErrorCode="2" UnixError="No such file or directory" Backtrace="addr2line -e fdbbackup.debug -p -C -f -i 0x9f0136 0x9ee88f 0x941541 0x9476ee 0x9485b9 0xa0b5ee 0xa0dbd7 0xa0e40b 0xa0d6d0 0xa0d8a3 0xa28978 0x478fd0 0xa154ef 0xa26b98 0xa2af25 0xa1d3c1 0xa1eb14 0xa18b1a 0xa19acb 0x74046e 0x423c15 0x7f59f15f8830" Machine="10.0.3.53:43991" LogGroup="default" />
<Event Severity="20" Time="1586813070.500949" Type="BlobStoreEndpointRequestFailedRetryable" ID="0000000000000000" Error="file_not_readable" ErrorDescription="File could not be read" ErrorCode="1513" SuppressedEventCount="0" ConnectionEstablished="0" RemoteHost="altabackup.nyc3.digitaloceanspaces.com" Verb="HEAD" Resource="/fdbbucket" ThisTry="1" RetryDelay="2" Machine="10.0.3.53:43991" LogGroup="default" />
<Event Severity="20" Time="1586813070.500949" Type="N2_ConnectError" ID="ce20c0deb751b8da" SuppressedEventCount="0" Message="125" Machine="10.0.3.53:43991" LogGroup="default" />
<Event Severity="10" Time="1586813072.017285" Type="GetLeaderReply" ID="0000000000000000" SuppressedEventCount="3" Coordinator="10.0.3.53:4500" Nominee="643b4318a006090b" Generation="0" Machine="10.0.3.53:43991" LogGroup="default" />
<Event Severity="10" Time="1586813072.509537" Type="TLSConnectionConnecting" ID="0000000000000000" SuppressedEventCount="0" ToAddr="162.243.189.2:443:tls" Machine="10.0.3.53:43991" LogGroup="default" />
<Event Severity="10" Time="1586813072.509537" Type="TLSConnectionSettingCertFile" ID="0000000000000000" CertFilePath="/etc/foundationdb/fdb.pem" Machine="10.0.3.53:43991" LogGroup="default" />
<Event Severity="40" Time="1586813072.509537" Type="TLSOptionsSetCertFileError" ID="0000000000000000" Filename="/etc/foundationdb/fdb.pem" Error="file_not_readable" ErrorDescription="File could not be read" ErrorCode="1513" UnixErrorCode="2" UnixError="No such file or directory" Backtrace="addr2line -e fdbbackup.debug -p -C -f -i 0x9f0136 0x9ee88f 0x941541 0x9476ee 0x9485b9 0xa0b5ee 0xa0dbd7 0xa0e40b 0xa0d6d0 0xa0d8a3 0xa28978 0x478fd0 0xa154ef 0xa26b98 0xa2af25 0xa1d3c1 0xa1eb14 0xa18b1a 0xa19acb 0x74046e 0x423c15 0x7f59f15f8830" Machine="10.0.3.53:43991" LogGroup="default" />
<Event Severity="20" Time="1586813072.509537" Type="N2_ConnectError" ID="d56d485c4f2df7a1" SuppressedEventCount="0" Message="125" Machine="10.0.3.53:43991" LogGroup="default" />

End Post

Something is wrong with your TLS configuration. It is trying to use /etc/foundationdb/fdb.pem as its TLS certificate but the file can’t be read. This might be a default location and the file just does not exist, I’m not sure.

@SteavedHams Thanks. While the directory exists, only 2 configuration files are inside. But no fdb.pem.

Not sure what the cert would be for. Digital Ocean doesn’t require it for spaces use. Any ideas?

The cert is just for speaking HTTPS to the service, which is the default because backup data sent over HTTP will contain keys and values from your database in plain text. If your connections will only traverse trusted networks and the service you are using has an HTTP endpoint, then you can disable HTTPS by adding a URL parameter sc=0 or secure_connection=0. So in your case, for example, it would become

blobstore://<access>:<secret>@<spacesname>.nyc3.digitaloceanspaces.com/firsttry?bucket=fdbbucket&secure_connection=0

Note that I also took out the 443 port as that is normally used for SSL.

Understood, thank you.

I ran the cmd, as described, but I’m getting the same error msg.

According to your program start line, you forgot to include the &secure_connection=0 on your blobstore URL.

Just checked for &secure_connection=0 and ran it again. Same error.

Does anyone have anymore ideas?

In all the other cases we’ve tested, adding &secure_connection=0 disables TLS. It’d be good to double check your ProgramStart to make sure it is getting set. I’m not sure why it’s not working for you, and if it’s something DigitialOcean specific, we don’t have a way to test that.

Are you escaping the & on the command line? You can’t just pass a & normally in a command line argument with most shells, it must be escaped with \& or located within a quoted string.

I wasn’t aware, no. This is what I ended up running though:

fdbbackup start --log -d blobstore://<accesskey>:<secretkey>@https://<spacename>.nyc3.digitaloceanspaces.com/firsttry?bucket=<bucketname>\&secure_connection=0

Same error.

What is the result of env | grep FDB_TLS?

Somehow you seem to have FDB configured to open a TLS cert file that doesn’t exist on your system.

Are you using TLS to talk to your FDB cluster? Backup uses TLS for two things - to talk to the database cluster and to talk to the S3-like blob service. I was assuming you are not using TLS on your cluster.

I’m not using TLS. No output from above cmd.

Boss had me abandon this method. Feel free to close this one out. Thanks for the help though everyone. Much appreciated.

Are you going to use the disk snapshot based backup or simply run without backup?

@mengxu I’ll be using the disk snapshot based backup in combination with s3cmd to a DOSpace.