Trying to restore a backup from S3

COran · December 29, 2020, 2:02pm

Hi Folks,

I’m trying to restore a backup from S3 to an empty FDB cluster. The context is the following : I have a FDB cluster running on k8s with backup process running on it. It uploads with success on S3 via minIO.

fdbbackup list -b "blobstore://admin@127.0.0.1:9000?bucket=xxxx&sc=0" --blob_credentials=/etc/fdb/blob_cred.json
blobstore://admin@127.0.0.1:9000/blabla?bucket=xxxx&sc=0

Then I created another FDB cluster (empty) with a same settings. I’m able to reach S3 through minIO from the pod where I want to do the fdbrestore. But when I’m running the following command :

fdbrestore start -w --log -r "blobstore://admin@127.0.0.1:9000/blabla?bucket=xxxx&sc=0" --blob_credentials=/etc/fdb/blob_cred.json --dest_cluster_file=/etc/foundationdb/fdb.cluster

I have the following output when I’m check the status of the restore :

Tag: default  UID: 22edfe20b449050db4d04d90a42efc71  State: starting  Blocks: 0/0  BlocksInProgress: 0  Files: 0  BytesWritten: 0  ApplyVersionLag: 0  LastError: ''HTTP response code not received or indicated failure' on 'restore_start'' 274s ago.
  URL: blobstore://admin@127.0.0.1:9000/blabla?bucket=xxxx&sc=0  Range: ''-'\xff'  AddPrefix: ''  RemovePrefix: ''  Version: 1885133112347

I suspect to have some issue with minIO but I’m not sure how to debug it. Then I checked the traces I had when I did start the restore : I noticed some weird logs.

<Event Severity="10" Time="1609247923.979622" Type="ConnectedOutgoing" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.128.140.100:4500" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="10" Time="1609247923.986947" Type="BlobStoreEndpointNewConnection" ID="0000000000000000" SuppressedEventCount="0" RemoteEndpoint="127.0.0.1:9000" ExpiresIn="120" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="10" Time="1609247923.987410" Type="AsyncFileOpened" ID="0000000000000000" SuppressedEventCount="0" Filename="/etc/fdb/blob_cred.json" Fd="11" Flags="2228225" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="10" Time="1609247923.987633" Type="AsyncFileClosed" ID="0000000000000000" SuppressedEventCount="0" Fd="11" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="10" Time="1609247924.048588" Type="BlobStoreEndpointReusingConnected" ID="0000000000000000" SuppressedEventCount="0" RemoteEndpoint="127.0.0.1:9000" ExpiresIn="119.939" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="10" Time="1609247924.142139" Type="BackupContainerDescribe2" ID="0000000000000000" URL="blobstore://admin@127.0.0.1:9000/blabla?bucket=xxxx&amp;sc=0" LogStartVersionOverride="-1" ExpiredEndVersion="-1" UnreliableEndVersion="-1" LogBeginVersion="1315974786096" LogEndVersion="1896493112348" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="10" Time="1609247925.998811" Type="ConnectionClosed" ID="2a781cf640f78cf0" Error="connection_unreferenced" ErrorDescription="No peer references for connection" ErrorCode="1048" SuppressedEventCount="0" PeerAddr="10.128.140.100:4500" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="10" Time="1609247925.998811" Type="PeerDestroy" ID="0000000000000000" Error="connection_unreferenced" ErrorDescription="No peer references for connection" ErrorCode="1048" SuppressedEventCount="0" PeerAddr="10.128.140.100:4500" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="20" Time="1609247925.998811" Type="N2_ReadProbeError" ID="2a781cf640f78cf0" SuppressedEventCount="0" ErrorCode="125" Message="Operation canceled" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="10" Time="1609247926.538969" Type="AsyncFileOpened" ID="0000000000000000" SuppressedEventCount="3" Filename="/etc/fdb/blob_cred.json" Fd="15" Flags="2228225" Machine="10.136.138.248:7359" LogGroup="default" />
<Event Severity="10" Time="1609247926.538969" Type="AsyncFileClosed" ID="0000000000000000" SuppressedEventCount="3" Fd="15" Machine="10.136.138.248:7359" LogGroup="default" />

Can those errors explain the failures I have for the restore ? Does it means my coordinators pods are not reachable ? Which is weird because the status details from the fdbcli is telling me that everything is ok.

If you have any tips for debugging, it would be appreciate.

SteavedHams · January 3, 2021, 10:15am

Those events are not errors, most of them are info and one is a warning.

I suspect you do not have any backup agents on your destination cluster.

fdbrestore only controls restore jobs, it does not do any of the restore work. As with backup, backup_agent does all the restore work too. You need 1 or more backup agents in your cluster with access to your backup data.

COran · January 5, 2021, 12:58pm

Indeed, I did reapply my charts and my agents were up. Finally succeed to get my restore working.

Thanks for your help.

Topic		Replies	Views
Fdbrestore Errors Kubernetes Operator	0	54	July 9, 2024
Backup /restore fdb Using FoundationDB	21	2718	October 5, 2019
S3 fdbrestore: Missing properties object Using FoundationDB	9	1197	June 7, 2022
Fdbrestore - just stuck Using FoundationDB	1	561	February 23, 2023
Running FDB backup with AWS SDK Using FoundationDB	5	86	October 15, 2024

Trying to restore a backup from S3

Related topics