Fdbrestore with FDB 7.1

Restoring a FDB backup fails after updating to FDB 7.1.

Command line:

fdbrestore start --dest-cluster-file /etc/foundationdb/fdb.cluster -r file:///tmp/backup-2023-02-23-05-16-02.385417 --log

fdbrestore status remains with Blocks: 0/0:

Tag: default  UID: e09b8486ccf722ebf0e69325cc8039a2  State: queued  Blocks: 0/0  BlocksInProgress: 0  Files: 0  BytesWritten: 0  CurrentVersion: -1 FirstConsistentVersion: 0  ApplyVersionLag: 0  LastError: None  URL: file:///tmp/backup-2023-02-23-05-16-02.385417  Range: ''-'\xff'  AddPrefix: ''  RemovePrefix: ''  Version: 122929142753483

Logfile reports suspect ConnectionTimedOut errors:

<Event Severity="10" Time="1677185231.067122" DateTime="2023-02-23T20:47:11Z" Type="Net2Starting" ID="0000000000000000" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.067845" DateTime="2023-02-23T20:47:11Z" Type="ProgramStart" ID="0000000000000000" SourceVersion="dc4bd7ef6fcda345276073599ed6a2b28b089e90" Version="7.1.27" PackageName="7.1" ActualTime="1677185231" CommandLine="/usr/lib/foundationdb/backup_agent/backup_agent --cluster-file=/etc/foundationdb/fdb.cluster --logdir=/var/log/foundationdb" MemoryLimit="8589934592" ThreadID="9749452796639931957" TrackLatestType="Original" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="ClientStart" ID="0000000000000000" SourceVersion="dc4bd7ef6fcda345276073599ed6a2b28b089e90" Version="7.1.27" PackageName="7.1" ActualTime="1677185231" ApiVersion="-1" ClientLibrary="/usr/lib/foundationdb/backup_agent/backup_agent" ImageOffset="(nil)" Primary="1" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" TrackLatestType="Original" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="MachineLoadDetail" ID="0000000000000000" User="1783166" Nice="405237" System="797197" Idle="24908964" IOWait="5692616" IRQ="98209" SoftIRQ="127443" Steal="0" Guest="0" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="Net2TLSConfig" ID="0000000000000000" CAPath="" CertificatePath="/etc/foundationdb/fdb.pem" KeyPath="/etc/foundationdb/fdb.pem" HasPassword="0" VerifyPeers="Check.Valid=1" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="NotifyAddressHealthy" ID="0000000000000000" SuppressedEventCount="0" Address="127.0.0.1:4500" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="DatabaseContextCreated" ID="2b27a262ee255c07" Backtrace="addr2line -e backup_agent.debug -p -C -f -i 0xee5d6c 0xaa1e41 0xaa62ac 0x562729 0x566365 0x54bbd0 0x7f1f5dff7d85" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="ConnectToDatabase" ID="2b27a262ee255c07" Version="7.1.27" ClusterFile="file:///etc/foundationdb/fdb.cluster" ConnectionString="vUJ6BoJN:eXcs6hFd9uVqmRQHZZE7jK9ovvLKFrX9@127.0.0.1:4500" ClientLibrary="/usr/lib/foundationdb/backup_agent/backup_agent" Primary="1" Internal="1" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" TrackLatestType="Original" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="CodeCoverage" ID="0000000000000000" File="/home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbclient/ReadYourWrites.actor.cpp" Line="1545" Condition="true" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="StartingRunLoopProfilingThread" ID="0000000000000000" Interval="0.125" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="Net2Running" ID="0000000000000000" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="ProcessTimeOffset" ID="0000000000000000" ProcessTime="1677185231.071075" SystemTime="1677185231.071076" OffsetFromSystemTime="-0.000000" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.071113" DateTime="2023-02-23T20:47:11Z" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500" PeerReferences="3" FailureStatus="OK" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.071113" DateTime="2023-02-23T20:47:11Z" Type="TaskBucketMetrics" ID="15c1621fdf501ca2" Elapsed="0" DispatchSlotChecksStarted="0 -1 0" DispatchErrors="0 -1 0" DispatchDoTasks="0 -1 0" DispatchEmptyTasks="0 -1 0" DispatchSlotChecksComplete="0 -1 0" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="20" Time="1677185231.071113" DateTime="2023-02-23T20:47:11Z" Type="N2_ConnectError" ID="479af99e20ed30b4" SuppressedEventCount="0" ErrorCode="13" Message="Permission denied" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.071113" DateTime="2023-02-23T20:47:11Z" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="10" Time="1677185231.071113" DateTime="2023-02-23T20:47:11Z" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="20" Time="1677185231.567337" DateTime="2023-02-23T20:47:11Z" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1677185231.446502" Trace="addr2line -e backup_agent.debug -p -C -f -i 0x7f1f5e394cf0 0x7f1f5e391a35 0xe7ea3d 0xe80ccd 0x9818c2 0x54b46b 0x7f1f5dff7d85 0x55d762" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="20" Time="1677185231.817405" DateTime="2023-02-23T20:47:11Z" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1677185231.696634" Trace="addr2line -e backup_agent.debug -p -C -f -i 0x7f1f5e394cf0 0x7f1f5e391a35 0xe7ea3d 0xe80ccd 0x9818c2 0x54b46b 0x7f1f5dff7d85 0x55d762" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
<Event Severity="20" Time="1677185232.067270" DateTime="2023-02-23T20:47:12Z" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1677185231.946765" Trace="addr2line -e backup_agent.debug -p -C -f -i 0x7f1f5e394cf0 0x7f1f5e391a35 0xe7ea3d 0xe80ccd 0x9818c2 0x54b46b 0x7f1f5dff7d85 0x55d762" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />
[...]

It works on a CentOS 7 - but fails on CentOS 8

I would bet that this is some TLS issue between the agent and the cluster.

Look at this error:

<Event Severity="20" Time="1677185231.071113" DateTime="2023-02-23T20:47:11Z" Type="N2_ConnectError" ID="479af99e20ed30b4" SuppressedEventCount="0" ErrorCode="13" Message="Permission denied" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />

There’s two places in the code base that emit a N2_ConnectError and this is one of them: foundationdb/Net2.actor.cpp at 33f8e90f9f5fc8c571b439d11199a0db0f09fcc9 · apple/foundationdb · GitHub.

I would guess that since you said in your other post that FDB 6.0 worked, at some point TLS was switched to being the default :person_shrugging: ?

edit: Just realized that this is a trace through the system:

"Net2Starting"
"ProgramStart"
"ClientStart"
"MachineLoadDetail"
"Net2TLSConfig"
"NotifyAddressHealthy"
"DatabaseContextCreated"
"ConnectToDatabase"
"CodeCoverage"
"StartingRunLoopProfilingThread"
"Net2Running"
"ProcessTimeOffset"
"ConnectingTo"
"TaskBucketMetrics"
"N2_ConnectError"
"ConnectionTimedOut"
"ConnectionClosed"
"Net2RunLoopTrace"
"Net2RunLoopTrace"
"Net2RunLoopTrace"
<Event Severity="10" Time="1677185231.067190" DateTime="2023-02-23T20:47:11Z" Type="Net2TLSConfig" ID="0000000000000000" CAPath="" CertificatePath="/etc/foundationdb/fdb.pem" KeyPath="/etc/foundationdb/fdb.pem" HasPassword="0" VerifyPeers="Check.Valid=1" ThreadID="9749452796639931957" Machine="127.0.0.1:45893" LogGroup="default" ClientDescription="primary-7.1.27-9749452796639931957" />

It’s trying load that pem file, and there’s probably some issue with it. One thing that looks suspicious is that your CA path is blank. I would start there.