Foundationdb 6.3.9 with TLS -- fdbcli status could not communicate with a quorum of coordination servers:

(you may be seeing several posts from me. I am trying to deploy foundationdb cluster to production with backup and restore and DR capabilities. )

I am using running this on Ubuntu 18.04 and I have executed the make_public.py with -t option.
I followed the directions on TLS page for configuration and I was able to bring up the fdbserver ; however fdbcli status reports that it could not communicate with a quorum of coordination servers. So the cluster is unusable.

foundationdb.conf

## foundationdb.conf
##
## Configuration file for FoundationDB server processes
## Full documentation is available at
## https://apple.github.io/foundationdb/configuration.html#the-configuration-file

[fdbmonitor]
user = foundationdb
group = foundationdb

[general]
restart_delay = 60
## by default, restart_backoff = restart_delay_reset_interval = restart_delay
# initial_restart_delay = 0
# restart_backoff = 60
# restart_delay_reset_interval = 60
cluster_file = /etc/foundationdb/fdb.cluster
# delete_envvars =
# kill_on_configuration_change = true

## Default parameters for individual fdbserver processes
[fdbserver]
command = /usr/sbin/fdbserver
public_address = auto:$ID:tls
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb
# logsize = 10MiB
# maxlogssize = 100MiB
# machine_id = 
# datacenter_id = 
# class = 
# memory = 8GiB
# storage_memory = 1GiB
# cache_memory = 2GiB
# metrics_cluster = 
# metrics_prefix =

tls_certificate_file = /etc/foundationdb/fdb.pem
tls_ca_file = /etc/foundationdb/cert.crt
tls_key_file = /etc/foundationdb/private.key
tls_verify_peers=0

## An individual fdbserver process with id 4500
## Parameters set here override defaults from the [fdbserver] section
[fdbserver.4500]

[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent --tls_certificate_file=/etc/foundationdb/fdb.pem --tls_ca_file=/etc/foundationdb/cert.crt --tls_key_file=/etc/foundationdb/private.key --tls_verify_peers=0
logdir = /var/log/foundationdb

[backup_agent.1]

fdb.cluster
efewXmJp:mp4e2u2h@192.168.49.3:4500:tls

status

vagrant@ubuntu1804:~/op$ sudo service foundationdb status
● foundationdb.service - LSB: start and stop foundationdb
   Loaded: loaded (/etc/init.d/foundationdb; generated)
   Active: active (running) since Mon 2021-05-17 15:06:25 UTC; 38min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 6957 ExecStop=/etc/init.d/foundationdb stop (code=exited, status=0/SUCCESS)
  Process: 7032 ExecStart=/etc/init.d/foundationdb start (code=exited, status=0/SUCCESS)
    Tasks: 1013 (limit: 2315)
   CGroup: /system.slice/foundationdb.service
           ├─7049 /usr/lib/foundationdb/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdbmonitor.pid --daemonize
           ├─7050 /usr/lib/foundationdb/backup_agent/backup_agent --tls_certificate_file /etc/foundationdb/fdb.pem --tls_ca_file /etc/foundationdb/cert.crt --tls_key_file /etc/foundationdb/private.key --t
           └─7051 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4500 --listen_address public --logdir /var/log/foundationdb --public_address auto:45

May 17 15:06:25 ubuntu1804.localdomain systemd[1]: Starting LSB: start and stop foundationdb...
May 17 15:06:25 ubuntu1804.localdomain systemd[1]: Started LSB: start and stop foundationdb.
May 17 15:06:25 ubuntu1804.localdomain fdbmonitor[7049]: LogGroup="default" Process="fdbmonitor": Watching conf file /etc/foundationdb/foundationdb.conf
May 17 15:06:25 ubuntu1804.localdomain fdbmonitor[7049]: LogGroup="default" Process="fdbmonitor": Watching conf dir /etc/foundationdb/ (2)
May 17 15:06:25 ubuntu1804.localdomain fdbmonitor[7049]: LogGroup="default" Process="fdbmonitor": Loading configuration /etc/foundationdb/foundationdb.conf
May 17 15:06:25 ubuntu1804.localdomain fdbmonitor[7049]: LogGroup="default" Process="fdbmonitor": Starting backup_agent.1
May 17 15:06:25 ubuntu1804.localdomain fdbmonitor[7049]: LogGroup="default" Process="fdbmonitor": Starting fdbserver.4500
May 17 15:06:25 ubuntu1804.localdomain fdbmonitor[7049]: LogGroup="default" Process="backup_agent.1": Launching /usr/lib/foundationdb/backup_agent/backup_agent (7050) for backup_agent.1
May 17 15:06:25 ubuntu1804.localdomain fdbmonitor[7049]: LogGroup="default" Process="fdbserver.4500": Launching /usr/sbin/fdbserver (7051) for fdbserver.4500
May 17 15:06:25 ubuntu1804.localdomain fdbmonitor[7049]: LogGroup="default" Process="fdbserver.4500": FDBD joined cluster.

fdcli commanded and output

vagrant@ubuntu1804:~/op$ fdbcli -C /etc/foundationdb/fdb.cluster --tls_certificate_file /etc/foundationdb/fdb.pem  --tls_ca_file /etc/foundationdb/cert.crt --tls_verify_peers Check.Valid=0  --log
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is unavailable; type `status' for more information.


Welcome to the fdbcli. For help, type `help'.
fdb> status

Using cluster file `/etc/foundationdb/fdb.cluster'.

Could not communicate with a quorum of coordination servers:
  192.168.49.3:4500:tls  (unreachable)

fdbcli trace log

<?xml version="1.0"?>
<Trace>
<Event Severity="10" Time="1621266378.486573" Type="Net2Starting" ID="0000000000000000" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266378.486634" Type="ClientStart" ID="0000000000000000" SourceVersion="cb52535f4699d30beceb7ad6220ab8d7299da946" Version="6.3.9" PackageName="6.3" ClusterFile="/etc/foundationdb/fdb.cluster" ConnectionString="efewXmJp:mp4e2u2h@192.168.49.3:4500:tls" ActualTime="1621266378" ApiVersion="-1" ImageOffset="(nil)" Machine="192.168.49.3:8286" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1621266378.486634" Type="MachineLoadDetail" ID="0000000000000000" User="3244" Nice="360" System="3046" Idle="1114169" IOWait="451" IRQ="0" SoftIRQ="314" Steal="0" Guest="0" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266378.486634" Type="Net2TLSConfig" ID="0000000000000000" CAPath="/etc/foundationdb/cert.crt" CertificatePath="/etc/foundationdb/fdb.pem" KeyPath="/etc/foundationdb/fdb.pem" HasPassword="0" VerifyPeers="Check.Valid=0" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266378.486634" Type="NotifyAddressHealthy" ID="0000000000000000" SuppressedEventCount="0" Address="192.168.49.3:4500:tls" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266378.486634" Type="CLIProgramStart" ID="0000000000000000" SourceVersion="cb52535f4699d30beceb7ad6220ab8d7299da946" Version="6.3.9" PackageName="6.3" ActualTime="1621266378" ClusterFile="/etc/foundationdb/fdb.cluster" ConnectionString="efewXmJp:mp4e2u2h@192.168.49.3:4500:tls" CommandLine="fdbcli -C /etc/foundationdb/fdb.cluster --tls_certificate_file /etc/foundationdb/fdb.pem --tls_ca_file /etc/foundationdb/cert.crt --tls_verify_peers Check.Valid=0 --log" Machine="192.168.49.3:8286" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1621266378.486634" Type="MonitorLeaderChange" ID="0000000000000000" NewLeader="0000000000000001" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266378.486634" Type="StartingRunLoopProfilingThread" ID="0000000000000000" Interval="0.125" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266378.486634" Type="Net2Running" ID="0000000000000000" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266378.486634" Type="ProcessTimeOffset" ID="0000000000000000" ProcessTime="1621266378.495821" SystemTime="1621266378.495822" OffsetFromSystemTime="-0.000001" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266378.495912" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" PeerReferences="6" FailureStatus="OK" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266378.987509" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266378.873160" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266379.238033" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266379.124208" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266379.487347" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266379.375513" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266379.738740" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266379.628671" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266379.989148" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266379.880956" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266380.239697" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266380.132175" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266380.239697" Type="N2_ConnectHandshakeError" ID="207020d2d3d67457" SuppressedEventCount="0" ErrorCode="104" Message="Connection reset by peer" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266380.239697" Type="ConnectionTimedOut" ID="207020d2d3d67457" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266380.239697" Type="ConnectionClosed" ID="207020d2d3d67457" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="10" Time="1621266380.470825" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" PeerReferences="6" FailureStatus="OK" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266380.470825" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266380.383604" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266380.740166" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266380.634122" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266380.990517" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266380.885752" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266381.241111" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266381.137127" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266381.471109" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266381.388299" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266381.741894" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266381.639353" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />
<Event Severity="20" Time="1621266381.992653" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621266381.890323" Trace="addr2line -e fdbcli.debug -p -C -f -i 0x7fe76d461980 0x7fe76d45e3b1 0x907b9f 0x909298 0x62fe82 0x49840b 0x7fe76caddbf7 0x4a19a5" Machine="192.168.49.3:8286" LogGroup="default" />

Can you check the server logs for any TLS or Connection related events around the time that fdbcli tried to connect? It seems the server is rejecting the connection and it may have some details as to why.

Hi
I do see a handlshake errror: N2_ConnectHandshakeError

But I do not know what that means. Here is the full log

<?xml version="1.0"?>
<Trace>
<Event Severity="10" Time="1621273825.695367" Type="Net2Starting" ID="0000000000000000" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.696830" Type="ProgramStart" ID="0000000000000000" SourceVersion="cb52535f4699d30beceb7ad6220ab8d7299da946" Version="6.3.9" PackageName="6.3" ActualTime="1621273825" CommandLine="/usr/lib/foundationdb/backup_agent/backup_agent --tls_certificate_file=/etc/foundationdb/fdb.pem --tls_ca_file=/etc/foundationdb/cert.crt --tls_key_file=/etc/foundationdb/private.key --tls_verify_peers=0 --cluster_file=/etc/foundationdb/fdb.cluster --logdir=/var/log/foundationdb" MemoryLimit="8589934592" TrackLatestType="Original" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.695382" Type="ClientStart" ID="0000000000000000" SourceVersion="cb52535f4699d30beceb7ad6220ab8d7299da946" Version="6.3.9" PackageName="6.3" ClusterFile="/etc/foundationdb/fdb.cluster" ConnectionString="efewXmJp:mp4e2u2h@192.168.49.3:4500:tls" ActualTime="1621273825" ApiVersion="-1" ImageOffset="(nil)" Machine="192.168.49.3:2721" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1621273825.695382" Type="MachineLoadDetail" ID="0000000000000000" User="407" Nice="5" System="767" Idle="33986" IOWait="201" IRQ="0" SoftIRQ="65" Steal="0" Guest="0" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.695382" Type="Net2TLSInitError" ID="0000000000000000" Error="file_not_readable" ErrorDescription="File could not be read" ErrorCode="1513" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.695382" Type="NotifyAddressHealthy" ID="0000000000000000" SuppressedEventCount="0" Address="192.168.49.3:4500:tls" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.695382" Type="CodeCoverage" ID="0000000000000000" File="/opt/foundation/__sourcedir_this_is_some_very_long_name_dir__/foundationdb/fdbclient/ReadYourWrites.actor.cpp" Line="1227" Condition="true" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.695382" Type="StartingRunLoopProfilingThread" ID="0000000000000000" Interval="0.125" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.695382" Type="Net2Running" ID="0000000000000000" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.695382" Type="ProcessTimeOffset" ID="0000000000000000" ProcessTime="1621273825.698201" SystemTime="1621273825.698201" OffsetFromSystemTime="0.000000" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.698221" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" PeerReferences="2" FailureStatus="OK" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.698221" Type="TaskBucketMetrics" ID="2bfda3d1514005de" Elapsed="0" DispatchSlotChecksStarted="0 -1 0" DispatchErrors="0 -1 0" DispatchDoTasks="0 -1 0" DispatchEmptyTasks="0 -1 0" DispatchSlotChecksComplete="0 -1 0" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="20" Time="1621273825.698221" Type="N2_ConnectError" ID="87c0e1e4c07ec1f8" SuppressedEventCount="0" ErrorCode="111" Message="Connection refused" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.698221" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273825.698221" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="20" Time="1621273826.196351" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621273826.075096" Trace="addr2line -e backup_agent.debug -p -C -f -i 0x7f0218c58980 0x7f0218c553b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f02182d4bf7 0x4d5905" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="20" Time="1621273826.446743" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621273826.326023" Trace="addr2line -e backup_agent.debug -p -C -f -i 0x7f0218c58980 0x7f0218c553b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f02182d4bf7 0x4d5905" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="20" Time="1621273826.696089" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621273826.576751" Trace="addr2line -e backup_agent.debug -p -C -f -i 0x7f0218c58980 0x7f0218c553b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f02182d4bf7 0x4d5905" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273826.698440" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" PeerReferences="2" FailureStatus="OK" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="20" Time="1621273826.700053" Type="N2_ConnectHandshakeError" ID="7c42ddc1622e2fea" SuppressedEventCount="0" ErrorCode="336151568" Message="sslv3 alert handshake failure" WhichMeans="error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273826.700053" Type="ConnectionTimedOut" ID="7c42ddc1622e2fea" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273826.700053" Type="ConnectionClosed" ID="7c42ddc1622e2fea" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="20" Time="1621273827.197449" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621273827.078197" Trace="addr2line -e backup_agent.debug -p -C -f -i 0x7f0218c58980 0x7f0218c553b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f02182d4bf7 0x4d5905" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="20" Time="1621273827.447729" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621273827.328858" Trace="addr2line -e backup_agent.debug -p -C -f -i 0x7f0218c58980 0x7f0218c553b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f02182d4bf7 0x4d5905" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="20" Time="1621273827.696516" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1621273827.579646" Trace="addr2line -e backup_agent.debug -p -C -f -i 0x7f0218c58980 0x7f0218c553b1 0xb618ff 0xb62ff8 0x855b72 0x4c88aa 0x7f02182d4bf7 0x4d5905" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273827.703884" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" PeerReferences="2" FailureStatus="OK" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="20" Time="1621273827.705118" Type="N2_ConnectHandshakeError" ID="6087c7811e02f6b5" SuppressedEventCount="0" ErrorCode="336151568" Message="sslv3 alert handshake failure" WhichMeans="error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure" Machine="192.168.49.3:2721" LogGroup="default" />
<Event Severity="10" Time="1621273827.705118" Type="ConnectionTimedOut" ID="6087c7811e02f6b5" SuppressedEventCount="0" PeerAddr="192.168.49.3:4500:tls" Machine="192.168.49.3:2721" LogGroup="default" />

This is the trace log for the backup agent (see the ProgramStart trace line), but there is some interesting information here. In particular, the following event indicates that some TLS related file cannot be read:

Type="Net2TLSInitError" ID="0000000000000000" Error="file_not_readable"

Unfortunately it doesn’t seem to indicate which file this is, but perhaps you could check the following files for whether they exist and have appropriate permissions:

--tls_certificate_file=/etc/foundationdb/fdb.pem --tls_ca_file=/etc/foundationdb/cert.crt --tls_key_file=/etc/foundationdb/private.key

To check the server logs which would probably have the most information, you could need to look at the file generated for the coordinator you are trying to connect to: 192.168.49.3:4500.

I think I was able to fix the file_not_readable by ensure chmod +644 on all the 3 file.

The log file for 192.168.49.3:4500 is very large. Here is the first few lines

I noticed that the log shows Error=“tls_error” ErrorDescription=“TLS error” ErrorCode="2107"

<?xml version="1.0"?>
<Trace>
<Event Severity="10" Time="1621286408.622891" Type="Net2Starting" ID="0000000000000000" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="Net2TLSConfig" ID="0000000000000000" CAPath="/etc/foundationdb/cert.crt" CertificatePath="/etc/foundationdb/fdb.pem" KeyPath="/etc/foundationdb/private.key" HasPassword="0" VerifyPeers="0" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="40" Time="1621286408.622913" Type="FDBLibTLSVerifyPeersParseError" ID="0000000000000000" Config="0" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x23199fc 0x2319068 0x23192c1 0x7695bc 0x230f1ae 0x22c4548 0x7766cc 0x7f3db348abf7" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="Net2TLSInitError" ID="0000000000000000" Error="tls_error" ErrorDescription="TLS error" ErrorCode="2107" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="Binding" ID="0000000000000000" PublicAddress="192.168.49.3:4500:tls" ListenAddress="192.168.49.3:4500:tls" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="ProgramStart" ID="0000000000000000" RandomSeed="322729913" SourceVersion="cb52535f4699d30beceb7ad6220ab8d7299da946" Version="6.3.9" PackageName="6.3" FileSystem="" DataFolder="/var/lib/foundationdb/data/4500" WorkingDirectory="/" ClusterFile="/etc/foundationdb/fdb.cluster" ConnectionString="5wZd9z1o:UrefPhfH@192.168.49.3:4500:tls" ActualTime="1621286408" CommandLine="/usr/sbin/fdbserver --cluster_file=/etc/foundationdb/fdb.cluster --datadir=/var/lib/foundationdb/data/4500 --listen_address=public --logdir=/var/log/foundationdb --public_address=auto:4500:tls --tls_ca_file=/etc/foundationdb/cert.crt --tls_certificate_file=/etc/foundationdb/fdb.pem --tls_key_file=/etc/foundationdb/private.key --tls_verify_peers=0" BuggifyEnabled="0" MemoryLimit="8589934592" Machine="192.168.49.3:4500" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1621286408.622913" Type="StartingRunLoopProfilingThread" ID="0000000000000000" Interval="0.125" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="NotifyAddressHealthy" ID="0000000000000000" SuppressedEventCount="0" Address="192.168.49.3:4500:tls" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="StartingFDBD" ID="0000000000000000" ZoneID="da8a5b4647531a514ab94db9968189bf" MachineId="da8a5b4647531a514ab94db9968189bf" DiskPath="/var/lib/foundationdb/data/4500" CoordPath="/var/lib/foundationdb/data/4500" WhiteListBinPath="" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="CoordinationServer" ID="9d938e6a3f6217aa" MyInterfaceAddr="192.168.49.3:4500:tls" Folder="/var/lib/foundationdb/data/4500" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="KVSMemOpening" ID="9d938e6a3f6217aa" Basename="/var/lib/foundationdb/data/4500/coordination-" MemoryLimit="500000000" StoreType="memory" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="KVSMemRecoveryStarted" ID="9d938e6a3f6217aa" SnapshotEndLocation="0.-1" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="AsyncFileKAIOOpen" ID="0000000000000000" Filename="/var/lib/foundationdb/data/4500/coordination-0.fdq" Flags="458754" Mode="1024" Fd="19" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="AsyncFileKAIOOpen" ID="0000000000000000" Filename="/var/lib/foundationdb/data/4500/coordination-1.fdq" Flags="458754" Mode="1024" Fd="20" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="AFCUnderlyingOpenBegin" ID="0000000000000000" Filename="/var/lib/foundationdb/data/4500/processId" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="AsyncFileKAIOOpen" ID="0000000000000000" Filename="/var/lib/foundationdb/data/4500/processId" Flags="458754" Mode="1408" Fd="21" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="AFCUnderlyingOpenEnd" ID="0000000000000000" Filename="/var/lib/foundationdb/data/4500/processId" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="AFCUnderlyingSize" ID="0000000000000000" Filename="/var/lib/foundationdb/data/4500/processId" Size="4096" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="Net2Running" ID="0000000000000000" Machine="192.168.49.3:4500" LogGroup="default" />
<Event Severity="10" Time="1621286408.622913" Type="ProcessTimeOffset" ID="0000000000000000" ProcessTime="1621286408.644830" SystemTime="1621286408.644830" OffsetFromSystemTime="0.000000" 

The Log file are very large; however I have attached them here ; (the content disposition is attachment )

Please note that I used the foundationdb instructions from here to create the cert,pem files.

Also … thanks for looking into this.

I see a couple interesting things in these logs. The first is that the server can’t parse the verify peers string:

<Event Severity="40" Time="1621286408.622913" Type="FDBLibTLSVerifyPeersParseError" ID="0000000000000000" Config="0" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x23199fc 0x2319068 0x23192c1 0x7695bc 0x230f1ae 0x22c4548 0x7766cc 0x7f3db348abf7" Machine="192.168.49.3:4500" LogGroup="default" />

This is being passed in with the argument:

--tls_verify_peers=0

If the intent is to disable verification, I think this needs to be:

--tls_verify_peers="Check.Valid=0"

Or if using foundationdb.conf:

tls_verify_peers=Check.Valid=0

The second thing I see is that there are connections failing the handshake:

<Event Severity="20" Time="1621286409.626959" Type="N2_AcceptHandshakeError" ID="0000000000000000" ErrorCode="337092801" ErrorMsg="no shared cipher" BackgroundThread="1" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />

It’s possible this is just because of the inability to initialize TLS above, so I’d suggest fixing that and seeing if this second problem also goes away.

So that sort of woked and sort of didn’t work. I set -tls_verify_peers="Check.Valid=0" as you suggested and the initial errors went away.

After this I started fdbcli several time . Sometimes the fdbcli would report database is available and sometimes it would say that it is not available.

vagrant@ubuntu1804:~/ss1$ fdbcli -C /etc/foundationdb/fdb.cluster --tls_certificate_file /etc/foundationdb/fdb.pem  --tls_ca_file /etc/foundationdb/cert.crt --tls_verify_peers "Check.Valid=0"  --log
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> quit
vagrant@ubuntu1804:~/ss1$ fdbcli -C /etc/foundationdb/fdb.cluster --tls_certificate_file /etc/foundationdb/fdb.pem  --tls_ca_file /etc/foundationdb/cert.crt --tls_verify_peers "Check.Valid=0"  --log
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is available.

Welcome to the fdbcli. For help, type `help'.
fdb> 
fdb> quit
vagrant@ubuntu1804:~/ss1$ fdbcli -C /etc/foundationdb/fdb.cluster --tls_certificate_file /etc/foundationdb/fdb.pem  --tls_ca_file /etc/foundationdb/cert.crt --tls_verify_peers "Check.Valid=0"  --log
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> qui
ERROR: Unknown command `qui'. Try `help'?
fdb> quit
vagrant@ubuntu1804:~/ss1$ fdbcli -C /etc/foundationdb/fdb.cluster --tls_certificate_file /etc/foundationdb/fdb.pem  --tls_ca_file /etc/foundationdb/cert.crt --tls_verify_peers "Check.Valid=0"  --log
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is available.

Welcome to the fdbcli. For help, type `help'.
fdb> 

After this I ran rep -B 1 -A 1 --color ‘Error’ * . on the foundaitondb log files. Here is that output. I see errorcode 1026 and 337092801 – I am not sure what that is

Here is the partial output of the grep:

trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286408.622913" Type="Net2TLSConfig" ID="0000000000000000" CAPath="/etc/foundationdb/cert.crt" CertificatePath="/etc/foundationdb/fdb.pem" KeyPath="/etc/foundationdb/private.key" HasPassword="0" VerifyPeers="0" Machine="192.168.49.3:4500" LogGroup="default" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="40" Time="1621286408.622913" Type="FDBLibTLSVerifyPeersParseError" ID="0000000000000000" Config="0" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x23199fc 0x2319068 0x23192c1 0x7695bc 0x230f1ae 0x22c4548 0x7766cc 0x7f3db348abf7" Machine="192.168.49.3:4500" LogGroup="default" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="10" Time="1621286408.622913" Type="Net2TLSInitError" ID="0000000000000000" Error="tls_error" ErrorDescription="TLS error" ErrorCode="2107" Machine="192.168.49.3:4500" LogGroup="default" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286408.622913" Type="Binding" ID="0000000000000000" PublicAddress="192.168.49.3:4500:tls" ListenAddress="192.168.49.3:4500:tls" Machine="192.168.49.3:4500" LogGroup="default" />
--
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286408.715328" Type="CCWDB" ID="c5db5a0b1c189ced" Recruiting="Master" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="10" Time="1621286408.715328" Type="CCWDB" ID="c5db5a0b1c189ced" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" Master="a8618a5b6b9d14e3" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="20" Time="1621286408.715328" Type="ClusterWatchDatabaseRetrying" ID="c5db5a0b1c189ced" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286408.715328" Type="TimeKeeperStarted" ID="0000000000000000" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
--
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286409.623794" Type="ConnectionFrom" ID="0ba2b6841df1ba55" SuppressedEventCount="0" FromAddress="192.168.49.3:54956:tls" ListenAddress="192.168.49.3:4500:tls" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="20" Time="1621286409.626959" Type="N2_AcceptHandshakeError" ID="0000000000000000" ErrorCode="337092801" ErrorMsg="no shared cipher" BackgroundThread="1" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="10" Time="1621286409.627265" Type="IncomingConnectionError" ID="0ba2b6841df1ba55" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" FromAddress="192.168.49.3:54956:tls" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286409.648474" Type="CCWDB" ID="c5db5a0b1c189ced" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
--
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286409.777587" Type="ResolverMetrics" ID="996e9d3c97f26466" Elapsed="0" ResolveBatchIn="0 -1 0" ResolveBatchStart="0 -1 0" ResolvedTransactions="0 -1 0" ResolvedBytes="0 -1 0" ResolvedReadConflictRanges="0 -1 0" ResolvedWriteConflictRanges="0 -1 0" TransactionsAccepted="0 -1 0" TransactionsTooOld="0 -1 0" TransactionsConflicted="0 -1 0" ResolvedStateTransactions="0 -1 0" ResolvedStateMutations="0 -1 0" ResolvedStateBytes="0 -1 0" ResolveBatchOut="0 -1 0" MetricsRequests="0 -1 0" SplitRequests="0 -1 0" Version="-1" NeededVersion="0" TotalStateBytes="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,MP,MS,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="10" Time="1621286409.778182" Type="ProxyMetrics" ID="644443842a2fd473" Elapsed="0" TxnRequestIn="0 -1 0" TxnRequestOut="0 -1 0" TxnRequestErrors="0 -1 0" TxnStartIn="0 -1 0" TxnStartOut="0 -1 0" TxnStartBatch="0 -1 0" TxnSystemPriorityStartIn="0 -1 0" TxnSystemPriorityStartOut="0 -1 0" TxnBatchPriorityStartIn="0 -1 0" TxnBatchPriorityStartOut="0 -1 0" TxnDefaultPriorityStartIn="0 -1 0" TxnDefaultPriorityStartOut="0 -1 0" TxnCommitIn="0 -1 0" TxnCommitVersionAssigned="0 -1 0" TxnCommitResolving="0 -1 0" TxnCommitResolved="0 -1 0" TxnCommitOut="0 -1 0" TxnCommitOutSuccess="0 -1 0" TxnCommitErrors="0 -1 0" TxnConflicts="0 -1 0" TxnThrottled="0 -1 0" CommitBatchIn="0 -1 0" CommitBatchOut="0 -1 0" MutationBytes="0 -1 0" Mutations="0 -1 0" ConflictRanges="0 -1 0" KeyServerLocationIn="0 -1 0" KeyServerLocationOut="0 -1 0" KeyServerLocationErrors="0 -1 0" LastAssignedCommitVersion="0" Version="0" CommittedVersion="1812439084" CommitBatchesMemBytesCount="0" MaxCompute="0" MinCompute="1000000000000" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,MP,MS,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286409.782226" Type="TLogRecover" ID="3c783ad8d1a5382e" LogId="44854720657f76a6" At="1712439084" Known="1710303754" Unrecovered="1710303755" Tags="-1:2,0:0,-7:0" Locality="-1" LogRouterTags="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,MP,MS,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286409.891909" Type="TotalDataInFlight" ID="d133ad6979ae0e61" Primary="1" TotalBytes="0" UnhealthyServers="0" ServerCount="1" StorageTeamSize="1" HighestPriority="140" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="20" Time="1621286409.892701" Type="RecruitStorageNotAvailable" ID="c5db5a0b1c189ced" IsCriticalRecruitment="0" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286409.892701" Type="DDExcludedServersChanged" ID="d133ad6979ae0e61" RowsExcluded="0" RowsFailed="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286410.258894" Type="RkUpdateBatch" ID="389b2d1dbc9da838" TPSLimit="1.17227e+06" Reason="4" ReasonServerID="44854720657f76a6" ReleasedTPS="14.202" ReleasedBatchTPS="0" TPSBasis="14.202" StorageServers="0" Proxies="1" TLogs="1" WorstFreeSpaceStorageServer="9223372036854775807" WorstFreeSpaceTLog="115916430778" WorstStorageServerQueue="0" LimitingStorageServerQueue="0" WorstTLogQueue="4387" TotalDiskUsageBytes="104865792" WorstStorageServerVersionLag="0" LimitingStorageServerVersionLag="0" WorstStorageServerDurabilityLag="0" LimitingStorageServerDurabilityLag="0" TagsAutoThrottled="0" TagsManuallyThrottled="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="20" Time="1621286410.270913" Type="RecruitStorageNotAvailable" ID="c5db5a0b1c189ced" SuppressedEventCount="0" OutstandingReq="0" IsCriticalRecruitment="0" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286410.270913" Type="GetTLogTeamDone" ID="0000000000000000" Completed="1" Policy="1" Results="1" Processes="1" Workers="1" Required="1" Desired="3" RatingTests="200" PolicyGenerations="100" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="10" Time="1621286410.626966" Type="ConnectionFrom" ID="97794f03e175e2a8" SuppressedEventCount="0" FromAddress="192.168.49.3:54958:tls" ListenAddress="192.168.49.3:4500:tls" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="20" Time="1621286410.630232" Type="N2_AcceptHandshakeError" ID="0000000000000000" ErrorCode="337092801" ErrorMsg="no shared cipher" BackgroundThread="1" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml:<Event Severity="10" Time="1621286410.630572" Type="IncomingConnectionError" ID="97794f03e175e2a8" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" FromAddress="192.168.49.3:54958:tls" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml-<Event Severity="20" Time="1621286410.763146" Type="MasterProxyRateLeaseExpired" ID="644443842a2fd473" SuppressedEventCount="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />

This my updated conf file:

## foundationdb.conf
##
## Configuration file for FoundationDB server processes
## Full documentation is available at
## https://apple.github.io/foundationdb/configuration.html#the-configuration-file

[fdbmonitor]
user = foundationdb
group = foundationdb

[general]
restart_delay = 60
## by default, restart_backoff = restart_delay_reset_interval = restart_delay
# initial_restart_delay = 0
# restart_backoff = 60
# restart_delay_reset_interval = 60
cluster_file = /etc/foundationdb/fdb.cluster
# delete_envvars =
# kill_on_configuration_change = true

## Default parameters for individual fdbserver processes
[fdbserver]
command = /usr/sbin/fdbserver
public_address = auto:$ID:tls
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb
# logsize = 10MiB
# maxlogssize = 100MiB
# machine_id = 
# datacenter_id = 
# class = 
# memory = 8GiB
# storage_memory = 1GiB
# cache_memory = 2GiB
# metrics_cluster = 
# metrics_prefix =

tls_certificate_file = /etc/foundationdb/fdb.pem
tls_ca_file = /etc/foundationdb/cert.crt
tls_key_file = /etc/foundationdb/private.key
tls_verify_peers = Check.Valid=0


## An individual fdbserver process with id 4500
## Parameters set here override defaults from the [fdbserver] section
[fdbserver.4500]

[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent  --tls_certificate_file=/etc/foundationdb/fdb.pem --tls_ca_file=/etc/foundationdb/cert.crt --tls_key_file=/etc/foundationdb/private.key --tls_verify_peers="Check.Valid=0"
logdir = /var/log/foundationdb

[backup_agent.1]

I think your grep output there is from the log in the previous run of fdbserver, but when you changed foundationdb.conf it would have restarted. Try looking for the newer log files, which should have different values for the parts I’ve marked in bold:

trace.192.168.49.3.4500.1621286408.vnOMfd.0.1.xml

Were you by any chance able to run status and see why it thought the cluster was unavailable?

oops – I grepped yesterdays logs. Sorry about that.

The status , when fbcli resports that the server is unvailable , is

"Could not communicate with a quorum of coordination servers:

192.168.49.3:4500:tls (unreachable)"

Here are the log files grep error results :

trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366816.391023" Type="CCWDB" ID="7cc16968130b277d" Recruiting="Master" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366816.391023" Type="CCWDB" ID="7cc16968130b277d" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" Master="889e6244954594e3" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366816.391023" Type="ClusterWatchDatabaseRetrying" ID="7cc16968130b277d" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366816.391023" Type="TimeKeeperStarted" ID="0000000000000000" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.295293" Type="ConnectionFrom" ID="68a66b01173feda9" SuppressedEventCount="0" FromAddress="192.168.49.3:33060:tls" ListenAddress="192.168.49.3:4500:tls" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366817.307982" Type="N2_AcceptHandshakeError" ID="0000000000000000" ErrorCode="337100999" ErrorMsg="peer did not return a certificate" BackgroundThread="1" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366817.308062" Type="IncomingConnectionError" ID="68a66b01173feda9" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" FromAddress="192.168.49.3:33060:tls" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.313709" Type="CCWDB" ID="7cc16968130b277d" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.441176" Type="ResolverMetrics" ID="8c2a96d21259890e" Elapsed="0" ResolveBatchIn="0 -1 0" ResolveBatchStart="0 -1 0" ResolvedTransactions="0 -1 0" ResolvedBytes="0 -1 0" ResolvedReadConflictRanges="0 -1 0" ResolvedWriteConflictRanges="0 -1 0" TransactionsAccepted="0 -1 0" TransactionsTooOld="0 -1 0" TransactionsConflicted="0 -1 0" ResolvedStateTransactions="0 -1 0" ResolvedStateMutations="0 -1 0" ResolvedStateBytes="0 -1 0" ResolveBatchOut="0 -1 0" MetricsRequests="0 -1 0" SplitRequests="0 -1 0" Version="-1" NeededVersion="0" TotalStateBytes="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,MP,MS,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366817.441906" Type="ProxyMetrics" ID="ebd46770fe2fba59" Elapsed="0" TxnRequestIn="0 -1 0" TxnRequestOut="0 -1 0" TxnRequestErrors="0 -1 0" TxnStartIn="0 -1 0" TxnStartOut="0 -1 0" TxnStartBatch="0 -1 0" TxnSystemPriorityStartIn="0 -1 0" TxnSystemPriorityStartOut="0 -1 0" TxnBatchPriorityStartIn="0 -1 0" TxnBatchPriorityStartOut="0 -1 0" TxnDefaultPriorityStartIn="0 -1 0" TxnDefaultPriorityStartOut="0 -1 0" TxnCommitIn="0 -1 0" TxnCommitVersionAssigned="0 -1 0" TxnCommitResolving="0 -1 0" TxnCommitResolved="0 -1 0" TxnCommitOut="0 -1 0" TxnCommitOutSuccess="0 -1 0" TxnCommitErrors="0 -1 0" TxnConflicts="0 -1 0" TxnThrottled="0 -1 0" CommitBatchIn="0 -1 0" CommitBatchOut="0 -1 0" MutationBytes="0 -1 0" Mutations="0 -1 0" ConflictRanges="0 -1 0" KeyServerLocationIn="0 -1 0" KeyServerLocationOut="0 -1 0" KeyServerLocationErrors="0 -1 0" LastAssignedCommitVersion="0" Version="0" CommittedVersion="102130605" CommitBatchesMemBytesCount="0" MaxCompute="0" MinCompute="1000000000000" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,MP,MS,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.446396" Type="TLogRecover" ID="5098362959344930" LogId="8fc8462638de8e37" At="2130605" Known="49375" Unrecovered="49376" Tags="-1:2,0:0,-7:0" Locality="-1" LogRouterTags="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,MP,MS,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.536632" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500" PeerReferences="24" FailureStatus="OK" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366817.536632" Type="N2_ConnectError" ID="be0e2ce4fe0e98fc" SuppressedEventCount="0" ErrorCode="111" Message="Connection refused" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.536632" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366817.536632" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.537958" Type="UpdateChangedDatacenter" ID="7cc16968130b277d" CCFirst="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.575294" Type="TotalDataInFlight" ID="87daf537e486edf5" Primary="1" TotalBytes="0" UnhealthyServers="0" ServerCount="1" StorageTeamSize="1" HighestPriority="140" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366817.575997" Type="RecruitStorageNotAvailable" ID="7cc16968130b277d" IsCriticalRecruitment="0" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.575997" Type="DDExcludedServersChanged" ID="87daf537e486edf5" RowsExcluded="0" RowsFailed="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.840559" Type="RkUpdateBatch" ID="29c9330ac44908be" TPSLimit="1.0539e+06" Reason="4" ReasonServerID="8fc8462638de8e37" ReleasedTPS="14.0658" ReleasedBatchTPS="0" TPSBasis="14.0658" StorageServers="0" Proxies="1" TLogs="1" WorstFreeSpaceStorageServer="9223372036854775807" WorstFreeSpaceTLog="115916664488" WorstStorageServerQueue="0" LimitingStorageServerQueue="0" WorstTLogQueue="9869" TotalDiskUsageBytes="104865792" WorstStorageServerVersionLag="0" LimitingStorageServerVersionLag="0" WorstStorageServerDurabilityLag="0" LimitingStorageServerDurabilityLag="0" TagsAutoThrottled="0" TagsManuallyThrottled="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366817.940892" Type="RecruitStorageNotAvailable" ID="7cc16968130b277d" SuppressedEventCount="0" OutstandingReq="0" IsCriticalRecruitment="0" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366817.940892" Type="GetTLogTeamDone" ID="0000000000000000" Completed="1" Policy="1" Results="1" Processes="1" Workers="1" Required="1" Desired="3" RatingTests="200" PolicyGenerations="100" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366818.298972" Type="ConnectionFrom" ID="618111542a316ae4" SuppressedEventCount="0" FromAddress="192.168.49.3:33076:tls" ListenAddress="192.168.49.3:4500:tls" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366818.318686" Type="N2_AcceptHandshakeError" ID="0000000000000000" ErrorCode="337100999" ErrorMsg="peer did not return a certificate" BackgroundThread="1" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366818.318772" Type="IncomingConnectionError" ID="618111542a316ae4" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" FromAddress="192.168.49.3:33076:tls" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366818.537114" Type="GenerationRegReadRequest" ID="0000000000000000" From="192.168.49.3:4500:tls" K="koy0cvjM:mrCnQ1Vd" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366818.550175" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="7" PeerAddr="127.0.0.1:4500" PeerReferences="24" FailureStatus="OK" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366818.550175" Type="N2_ConnectError" ID="fa2d4d782823ace3" SuppressedEventCount="7" ErrorCode="111" Message="Connection refused" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366818.550175" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="7" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366818.550175" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="7" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366818.560264" Type="RkTracking" ID="29c9330ac44908be" StorageServer="b982c93c6c160862" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366819.559287" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="2" PeerAddr="127.0.0.1:4500" PeerReferences="12" FailureStatus="OK" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366819.559287" Type="N2_ConnectError" ID="f5855cb0f81e5594" SuppressedEventCount="2" ErrorCode="111" Message="Connection refused" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366819.559287" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="2" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366819.559287" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="2" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366819.700969" Type="SomewhatSlowRunLoopTop" ID="0000000000000000" Elapsed="0.0241761" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366820.967672" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="1" PeerAddr="127.0.0.1:4500" PeerReferences="12" FailureStatus="OK" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366820.967672" Type="N2_ConnectError" ID="a6d07212e14a7584" SuppressedEventCount="1" ErrorCode="111" Message="Connection refused" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366820.967672" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="1" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366820.967672" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="1" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366821.034494" Type="RkUpdateBatch" ID="29c9330ac44908be" TPSLimit="4.63163e+06" Reason="4" ReasonServerID="8fc8462638de8e37" ReleasedTPS="2.53507" ReleasedBatchTPS="0" TPSBasis="2.53507" StorageServers="1" Proxies="1" TLogs="1" WorstFreeSpaceStorageServer="973741178" WorstFreeSpaceTLog="115916572959" WorstStorageServerQueue="59402" LimitingStorageServerQueue="59402" WorstTLogQueue="9869" TotalDiskUsageBytes="104886272" WorstStorageServerVersionLag="0" LimitingStorageServerVersionLag="0" WorstStorageServerDurabilityLag="103757267" LimitingStorageServerDurabilityLag="103757267" TagsAutoThrottled="0" TagsManuallyThrottled="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366821.225431" Type="RkUpdateBatch" ID="29c9330ac44908be" TPSLimit="4.63163e+06" Reason="4" ReasonServerID="8fc8462638de8e37" ReleasedTPS="2.09444" ReleasedBatchTPS="0" TPSBasis="2.09444" StorageServers="1" Proxies="1" TLogs="1" WorstFreeSpaceStorageServer="973741178" WorstFreeSpaceTLog="115916571735" WorstStorageServerQueue="59402" LimitingStorageServerQueue="59402" WorstTLogQueue="9869" TotalDiskUsageBytes="104886272" WorstStorageServerVersionLag="0" LimitingStorageServerVersionLag="0" WorstStorageServerDurabilityLag="103851762" LimitingStorageServerDurabilityLag="103851762" TagsAutoThrottled="0" TagsManuallyThrottled="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366821.312545" Type="IncomingConnectionError" ID="2dace11d0736fddf" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" FromAddress="192.168.49.3:33086:tls" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366821.317641" Type="TransactionMetrics" ID="e2a644c47f2f278b" Elapsed="0" Cluster="" Internal="1" ReadVersions="0 -1 0" ReadVersionsThrottled="0 -1 0" ReadVersionsCompleted="0 -1 0" ReadVersionBatches="0 -1 0" BatchPriorityReadVersions="0 -1 0" DefaultPriorityReadVersions="0 -1 0" ImmediatePriorityReadVersions="0 -1 0" BatchPriorityReadVersionsCompleted="0 -1 0" DefaultPriorityReadVersionsCompleted="0 -1 0" ImmediatePriorityReadVersionsCompleted="0 -1 0" LogicalUncachedReads="0 -1 0" PhysicalReadRequests="0 -1 0" PhysicalReadRequestsCompleted="0 -1 0" GetKeyRequests="0 -1 0" GetValueRequests="0 -1 0" GetRangeRequests="0 -1 0" WatchRequests="0 -1 0" GetAddressesForKeyRequests="0 -1 0" BytesRead="0 -1 0" KeysRead="0 -1 0" MetadataVersionReads="0 -1 0" CommittedMutations="0 -1 0" CommittedMutationBytes="0 -1 0" SetMutations="0 -1 0" ClearMutations="0 -1 0" AtomicMutations="0 -1 0" CommitStarted="0 -1 0" CommitCompleted="0 -1 0" KeyServerLocationRequests="0 -1 0" KeyServerLocationRequestsCompleted="0 -1 0" TooOld="0 -1 0" FutureVersions="0 -1 0" NotCommitted="0 -1 0" MaybeCommitted="0 -1 0" ResourceConstrained="0 -1 0" ProcessBehind="0 -1 0" Throttled="0 -1 0" MeanLatency="0" MedianLatency="0" Latency90="0" Latency98="0" MaxLatency="0" MeanRowReadLatency="0" MedianRowReadLatency="0" MaxRowReadLatency="0" MeanGRVLatency="0" MedianGRVLatency="0" MaxGRVLatency="0" MeanCommitLatency="0" MedianCommitLatency="0" MaxCommitLatency="0" MeanMutationsPerCommit="0" MedianMutationsPerCommit="0" MaxMutationsPerCommit="0" MeanBytesPerCommit="0" MedianBytesPerCommit="0" MaxBytesPerCommit="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366821.397765" Type="MachineLoadDetail" ID="0000000000000000" User="1604" Nice="93" System="1523" Idle="206590" IOWait="297" IRQ="0" SoftIRQ="109" Steal="0" Guest="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366821.397765" Type="ProcessMetrics" ID="0000000000000000" Elapsed="5.04604" CPUSeconds="0.238418" MainThreadCPUSeconds="0.197647" UptimeSeconds="5.08129" Memory="530567168" ResidentMemory="65507328" UnusedAllocatedMemory="0" MbpsSent="0" MbpsReceived="0" DiskTotalBytes="132224544768" DiskFreeBytes="122527784960" DiskQueueDepth="0" DiskIdleSeconds="4.99404" DiskReads="1" DiskWrites="50" DiskReadsCount="20560" DiskWritesCount="2800" DiskWriteSectors="688" DiskReadSectors="8" FileWrites="39" FileReads="1" CacheReadBytes="8192" CacheFinds="9" CacheWritesBlocked="1" CacheReadsBlocked="1" CachePageReadsMerged="0" CacheWrites="7" CacheReads="3" CacheHits="7" CacheMisses="1" CacheEvictions="0" ZoneID="8239170dec3560137f6277fe5c689e8f" MachineID="8239170dec3560137f6277fe5c689e8f" AIOSubmitCount="19" AIOCollectCount="19" AIOSubmitLag="2.60069e-07" AIODiskStall="1.39385e-05" CurrentConnections="0" ConnectionsEstablished="0" ConnectionsClosed="0" ConnectionErrors="0" TLSPolicyFailures="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366821.397765" Type="MemoryMetrics" ID="0000000000000000" TotalMemory16="393216" ApproximateUnusedMemory16="0" ActiveThreads16="3" TotalMemory32="393216" ApproximateUnusedMemory32="0" ActiveThreads32="3" TotalMemory64="1179648" ApproximateUnusedMemory64="0" ActiveThreads64="8" TotalMemory96="262080" ApproximateUnusedMemory96="0" ActiveThreads96="1" TotalMemory128="131072" ApproximateUnusedMemory128="0" ActiveThreads128="1" TotalMemory256="262144" ApproximateUnusedMemory256="0" ActiveThreads256="1" TotalMemory512="131072" ApproximateUnusedMemory512="0" ActiveThreads512="1" TotalMemory1024="131072" ApproximateUnusedMemory1024="0" ActiveThreads1024="1" TotalMemory2048="131072" ApproximateUnusedMemory2048="0" ActiveThreads2048="1" TotalMemory4096="131072" ApproximateUnusedMemory4096="0" ActiveThreads4096="1" TotalMemory8192="131072" ApproximateUnusedMemory8192="0" ActiveThreads8192="1" HugeArenaMemory="20305883" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366822.472212" Type="ResolverMetrics" ID="8c2a96d21259890e" Elapsed="5.03104" ResolveBatchIn="2.38519 3.35373 12" ResolveBatchStart="2.38519 3.35379 12" ResolvedTransactions="2.38519 -0.996365 12" ResolvedBytes="1272.1 0.938558 6400" ResolvedReadConflictRanges="5.56545 -0.989214 28" ResolvedWriteConflictRanges="4.77039 -0.99273 24" TransactionsAccepted="1.98766 -0.996969 10" TransactionsTooOld="0 -1 0" TransactionsConflicted="0.397532 -0.999227 2" ResolvedStateTransactions="1.7889 -0.997274 9" ResolvedStateMutations="2.78273 -0.995759 14" ResolvedStateBytes="376.463 -0.426308 1894" ResolveBatchOut="2.38519 3.35379 12" MetricsRequests="0 -1 0" SplitRequests="0 -1 0" Version="106519119" NeededVersion="0" TotalStateBytes="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366822.472212" Type="ProxyMetrics" ID="ebd46770fe2fba59" Elapsed="5.03031" TxnRequestIn="6.56024 4.03353 33" TxnRequestOut="6.56024 4.0337 33" TxnRequestErrors="0 -1 0" TxnStartIn="6.36144 3.90159 32" TxnStartOut="6.36144 3.90176 32" TxnStartBatch="6.16265 3.72863 31" TxnSystemPriorityStartIn="3.18072 1.50996 16" TxnSystemPriorityStartOut="3.18072 1.51003 16" TxnBatchPriorityStartIn="0 -1 0" TxnBatchPriorityStartOut="0 -1 0" TxnDefaultPriorityStartIn="3.18072 1.53435 16" TxnDefaultPriorityStartOut="3.18072 1.53636 16" TxnCommitIn="2.38554 -0.997443 12" TxnCommitVersionAssigned="2.38554 -0.997307 12" TxnCommitResolving="2.38554 -0.997307 12" TxnCommitResolved="2.38554 -0.996372 12" TxnCommitOut="2.38554 -0.996259 12" TxnCommitOutSuccess="1.98795 -0.996882 10" TxnCommitErrors="0 -1 0" TxnConflicts="0.39759 -0.99913 2" TxnThrottled="0 -1 0" CommitBatchIn="2.18675 2.99124 11" CommitBatchOut="2.18675 2.9758 11" MutationBytes="240.144 -0.634494 1208" Mutations="3.37952 -0.994856 17" ConflictRanges="10.3373 -0.988331 52" KeyServerLocationIn="2.78313 -0.996109 14" KeyServerLocationOut="2.78313 -0.995087 14" KeyServerLocationErrors="0 -1 0" LastAssignedCommitVersion="106519119" Version="106519119" CommittedVersion="106519119" CommitBatchesMemBytesCount="0" MaxCompute="3824" MinCompute="205" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366822.472769" Type="TransactionMetrics" ID="6c6a0b2ded8155f3" Elapsed="0" Cluster="" Internal="1" ReadVersions="1.23353e-09 1 2" ReadVersionsThrottled="0 -1 0" ReadVersionsCompleted="6.16764e-10 -6.08264e-09 1" ReadVersionBatches="1.23353e-09 1 2" BatchPriorityReadVersions="0 -1 0" DefaultPriorityReadVersions="1.23353e-09 1 2" ImmediatePriorityReadVersions="0 -1 0" BatchPriorityReadVersionsCompleted="0 -1 0" DefaultPriorityReadVersionsCompleted="6.16764e-10 -6.08264e-09 1" ImmediatePriorityReadVersionsCompleted="0 -1 0" LogicalUncachedReads="1.23353e-09 1 2" PhysicalReadRequests="6.16764e-10 -6.08264e-09 1" PhysicalReadRequestsCompleted="6.16764e-10 -6.06195e-09 1" GetKeyRequests="0 -1 0" GetValueRequests="1.23353e-09 1 2" GetRangeRequests="0 -1 0" WatchRequests="0 -1 0" GetAddressesForKeyRequests="0 -1 0" BytesRead="4.07064e-08 65 66" KeysRead="6.16764e-10 -6.06195e-09 1" MetadataVersionReads="0 -1 0" CommittedMutations="6.16764e-10 -6.05662e-09 1" CommittedMutationBytes="6.16764e-08 99 100" SetMutations="6.16764e-10 -6.06195e-09 1" ClearMutations="0 -1 0" AtomicMutations="0 -1 0" CommitStarted="6.16764e-10 -6.06195e-09 1" CommitCompleted="6.16764e-10 -6.05662e-09 1" KeyServerLocationRequests="6.16764e-10 -6.08264e-09 1" KeyServerLocationRequestsCompleted="6.16764e-10 -6.08264e-09 1" TooOld="0 -1 0" FutureVersions="0 -1 0" NotCommitted="0 -1 0" MaybeCommitted="0 -1 0" ResourceConstrained="0 -1 0" ProcessBehind="0 -1 0" Throttled="0 -1 0" MeanLatency="0.024812" MedianLatency="0.024812" Latency90="0.024812" Latency98="0.024812" MaxLatency="0.024812" MeanRowReadLatency="0.016773" MedianRowReadLatency="0.016773" MaxRowReadLatency="0.016773" MeanGRVLatency="0.00371671" MedianGRVLatency="0.00371671" MaxGRVLatency="0.00371671" MeanCommitLatency="0.00432229" MedianCommitLatency="0.00432229" MaxCommitLatency="0.00432229" MeanMutationsPerCommit="1" MedianMutationsPerCommit="1" MaxMutationsPerCommit="1" MeanBytesPerCommit="100" MedianBytesPerCommit="100" MaxBytesPerCommit="100" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366823.036703" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500" PeerReferences="12" FailureStatus="OK" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="20" Time="1621366823.036703" Type="N2_ConnectError" ID="280d4e03ce81d01b" SuppressedEventCount="0" ErrorCode="111" Message="Connection refused" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366823.036703" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366823.036703" Type="NotifyAddressFailed" ID="0000000000000000" SuppressedEventCount="0" Address="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366823.036703" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366823.037968" Type="AllAlternativesFailed" ID="0000000000000000" Interval="1" Alternatives="b982c93c6c160862" Delay="0.05" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />
--

Your error now is “peer did not return a certificate”, though possibly this is coming from another process on the same host as your fdbcli and is causing the server to insert connection delays. If you could try grepping your logs for TLS, it might be able to confirm that idea or point us toward something else.

HI AJ
I did greped tHe log file on TLS

I see a log of “TLS Policy Failure” in the logs

grep-out.txt-trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366821.397765" Type="MachineLoadDetail" ID="0000000000000000" User="1604" Nice="93" System="1523" Idle="206590" IOWait="297" IRQ="0" SoftIRQ="109" Steal="0" Guest="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />^M
grep-out.txt:trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366821.397765" Type="ProcessMetrics" ID="0000000000000000" Elapsed="5.04604" CPUSeconds="0.238418" MainThreadCPUSeconds="0.197647" UptimeSeconds="5.08129" Memory="530567168" ResidentMemory="65507328" UnusedAllocatedMemory="0" MbpsSent="0" MbpsReceived="0" DiskTotalBytes="132224544768" DiskFreeBytes="122527784960" DiskQueueDepth="0" DiskIdleSeconds="4.99404" DiskReads="1" DiskWrites="50" DiskReadsCount="20560" DiskWritesCount="2800" DiskWriteSectors="688" DiskReadSectors="8" FileWrites="39" FileReads="1" CacheReadBytes="8192" CacheFinds="9" CacheWritesBlocked="1" CacheReadsBlocked="1" CachePageReadsMerged="0" CacheWrites="7" CacheReads="3" CacheHits="7" CacheMisses="1" CacheEvictions="0" ZoneID="8239170dec3560137f6277fe5c689e8f" MachineID="8239170dec3560137f6277fe5c689e8f" AIOSubmitCount="19" AIOCollectCount="19" AIOSubmitLag="2.60069e-07" AIODiskStall="1.39385e-05" CurrentConnections="0" ConnectionsEstablished="0" ConnectionsClosed="0" ConnectionErrors="0" TLSPolicyFailures="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />^M
grep-out.txt-trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366821.397765" Type="MemoryMetrics" ID="0000000000000000" TotalMemory16="393216" ApproximateUnusedMemory16="0" ActiveThreads16="3" TotalMemory32="393216" ApproximateUnusedMemory32="0" ActiveThreads32="3" TotalMemory64="1179648" ApproximateUnusedMemory64="0" ActiveThreads64="8" TotalMemory96="262080" ApproximateUnusedMemory96="0" ActiveThreads96="1" TotalMemory128="131072" ApproximateUnusedMemory128="0" ActiveThreads128="1" TotalMemory256="262144" ApproximateUnusedMemory256="0" ActiveThreads256="1" TotalMemory512="131072" ApproximateUnusedMemory512="0" ActiveThreads512="1" TotalMemory1024="131072" ApproximateUnusedMemory1024="0" ActiveThreads1024="1" TotalMemory2048="131072" ApproximateUnusedMemory2048="0" ActiveThreads2048="1" TotalMemory4096="131072" ApproximateUnusedMemory4096="0" ActiveThreads4096="1" TotalMemory8192="131072" ApproximateUnusedMemory8192="0" ActiveThreads8192="1" HugeArenaMemory="20305883" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />^M
--
grep-out.txt-trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366826.421801" Type="MachineLoadDetail" ID="0000000000000000" User="1608" Nice="93" System="1527" Idle="207572" IOWait="297" IRQ="0" SoftIRQ="109" Steal="0" Guest="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />^M
grep-out.txt:trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366826.421801" Type="ProcessMetrics" ID="0000000000000000" Elapsed="5.02404" CPUSeconds="0.113222" MainThreadCPUSeconds="0.105273" UptimeSeconds="10.1053" Memory="530722816" ResidentMemory="65507328" UnusedAllocatedMemory="0" MbpsSent="0" MbpsReceived="0" DiskTotalBytes="132224544768" DiskFreeBytes="122527723520" DiskQueueDepth="0" DiskIdleSeconds="5.02004" DiskReads="0" DiskWrites="12" DiskReadsCount="20560" DiskWritesCount="2812" DiskWriteSectors="184" DiskReadSectors="0" FileWrites="8" FileReads="0" CacheReadBytes="0" CacheFinds="0" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="0" CacheReads="0" CacheHits="0" CacheMisses="0" CacheEvictions="0" ZoneID="8239170dec3560137f6277fe5c689e8f" MachineID="8239170dec3560137f6277fe5c689e8f" AIOSubmitCount="4" AIOCollectCount="4" AIOSubmitLag="6.68973e-08" AIODiskStall="4.70417e-07" CurrentConnections="0" ConnectionsEstablished="0" ConnectionsClosed="0" ConnectionErrors="0" TLSPolicyFailures="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />^M
grep-out.txt-trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366826.421801" Type="MemoryMetrics" ID="0000000000000000" TotalMemory16="393216" ApproximateUnusedMemory16="0" ActiveThreads16="3" TotalMemory32="393216" ApproximateUnusedMemory32="0" ActiveThreads32="3" TotalMemory64="1179648" ApproximateUnusedMemory64="0" ActiveThreads64="8" TotalMemory96="262080" ApproximateUnusedMemory96="0" ActiveThreads96="1" TotalMemory128="131072" ApproximateUnusedMemory128="0" ActiveThreads128="1" TotalMemory256="262144" ApproximateUnusedMemory256="0" ActiveThreads256="1" TotalMemory512="131072" ApproximateUnusedMemory512="0" ActiveThreads512="1" TotalMemory1024="131072" ApproximateUnusedMemory1024="0" ActiveThreads1024="1" TotalMemory2048="131072" ApproximateUnusedMemory2048="0" ActiveThreads2048="1" TotalMemory4096="131072" ApproximateUnusedMemory4096="0" ActiveThreads4096="1" TotalMemory8192="131072" ApproximateUnusedMemory8192="0" ActiveThreads8192="1" HugeArenaMemory="20297672" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />^M
--
grep-out.txt-trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366831.421948" Type="MachineLoadDetail" ID="0000000000000000" User="1610" Nice="94" System="1530" Idle="208549" IOWait="297" IRQ="0" SoftIRQ="110" Steal="0" Guest="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />^M
grep-out.txt:trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366831.421948" Type="ProcessMetrics" ID="0000000000000000" Elapsed="5.00015" CPUSeconds="0.141993" MainThreadCPUSeconds="0.132921" UptimeSeconds="15.1055" Memory="510849024" ResidentMemory="66211840" UnusedAllocatedMemory="0" MbpsSent="0" MbpsReceived="0" DiskTotalBytes="132224544768" DiskFreeBytes="122527625216" DiskQueueDepth="0" DiskIdleSeconds="4.97215" DiskReads="0" DiskWrites="26" DiskReadsCount="20560" DiskWritesCount="2838" DiskWriteSectors="440" DiskReadSectors="0" FileWrites="31" FileReads="0" CacheReadBytes="12288" CacheFinds="21" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="21" CacheReads="7" CacheHits="22" CacheMisses="0" CacheEvictions="0" ZoneID="8239170dec3560137f6277fe5c689e8f" MachineID="8239170dec3560137f6277fe5c689e8f" AIOSubmitCount="13" AIOCollectCount="13" AIOSubmitLag="1.21734e-07" AIODiskStall="9.7255e-07" CurrentConnections="0" ConnectionsEstablished="0" ConnectionsClosed="0" ConnectionErrors="0" TLSPolicyFailures="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />^M
grep-out.txt-trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366831.421948" Type="MemoryMetrics" ID="0000000000000000" TotalMemory16="393216" ApproximateUnusedMemory16="0" ActiveThreads16="3" TotalMemory32="393216" ApproximateUnusedMemory32="0" ActiveThreads32="3" TotalMemory64="1179648" ApproximateUnusedMemory64="0" ActiveThreads64="8" TotalMemory96="262080" ApproximateUnusedMemory96="0" ActiveThreads96="1" TotalMemory128="131072" ApproximateUnusedMemory128="0" ActiveThreads128="1" TotalMemory256="393216" ApproximateUnusedMemory256="0" ActiveThreads256="1" TotalMemory512="131072" ApproximateUnusedMemory512="0" ActiveThreads512="1" TotalMemory1024="131072" ApproximateUnusedMemory1024="0" ActiveThreads1024="1" TotalMemory2048="131072" ApproximateUnusedMemory2048="0" ActiveThreads2048="1" TotalMemory4096="131072" ApproximateUnusedMemory4096="0" ActiveThreads4096="1" TotalMemory8192="131072" ApproximateUnusedMemory8192="0" ActiveThreads8192="1" HugeArenaMemory="297632" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />^M
--
grep-out.txt-trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml-<Event Severity="10" Time="1621366836.422228" Type="MachineLoadDetail" ID="0000000000000000" User="1611" Nice="94" System="1531" Idle="209520" IOWait="297" IRQ="0" SoftIRQ="110" Steal="0" Guest="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" />^M
grep-out.txt:trace.192.168.49.3.4500.1621366816.OZHsXU.0.1.xml:<Event Severity="10" Time="1621366836.422228" Type="ProcessMetrics" ID="0000000000000000" Elapsed="5.00028" CPUSeconds="0.174887" MainThreadCPUSeconds="0.159564" UptimeSeconds="20.1057" Memory="511635456" ResidentMemory="66744320" UnusedAllocatedMemory="0" MbpsSent="0" MbpsReceived="0" DiskTotalBytes="132224544768" DiskFreeBytes="122527576064" DiskQueueDepth="0" DiskIdleSeconds="4.98428" DiskReads="0" DiskWrites="14" DiskReadsCount="20560" DiskWritesCount="2852" DiskWriteSectors="272" DiskReadSectors="0" FileWrites="18" FileReads="0" CacheReadBytes="8192" CacheFinds="14" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="14" CacheReads="2" CacheHits="12" CacheMisses="0" CacheEvictions="0" ZoneID="8239170dec3560137f6277fe5c689e8f" MachineID="8239170dec3560137f6277fe5c689e8f" AIOSubmitCount="8" AIOCollectCount="8" AIOSubmitLag="1.42513e-07" AIODiskStall="9.54414e-07" CurrentConnections="0" ConnectionsEstablished="0" ConnectionsClosed="0" ConnectionErrors="0" TLSPolicyFailures="0" Machine="192.168.49.3:4500" LogGroup="default" Roles="CC,CD,DD,MP,MS,RK,RV,SS,TL" TrackLatestType="Original" />

These particular events are indicating that there are no policy failures (TLSPolicyFailures="0"). Maybe you could try filtering out ProcessMetrics events in your grep and see what’s left.