Could not use fdbcli because of a (separate) badly-TLS-configured fdb client?

Hi, I’m on FDB 6.3.15 and use TLS. I run a golang service that is an fdb client (using the fdb go bindings) and I had a bug where I passed in the TLS options incorrectly. While debugging this in my test environment I found that while the bad application was running, I couldn’t use fdbcli - even though fdbcli was using the correct TLS file paths. When I turned off the bad application, fdbcli worked again. Has anyone hit this before?

My go fdb client was using these options: TLS_KEY_BYTES, TLS_CERT_BYTES, TLS_CA_BYTES. The bug was that we were passing the cert bytes for TLS_KEY_BYTES, instead of the correct key bytes. The fdb client error log was:

<Event Severity="10" Time="1644278513.282748" DateTime="2022-02-08T00:01:53Z" Type="TLSConfigureError" ID="0000000000000000" What="use_private_key: no start line" Value="151584876" WhichMeans="error:0909006C:PEM routines:get_name:no start line" Machine="127.0.0.1:1" LogGroup="default" />
<Event Severity="10" Time="1644278513.282748" DateTime="2022-02-08T00:01:53Z" Type="Net2TLSInitError" ID="0000000000000000" Error="tls_error" ErrorDescription="TLS error" ErrorCode="2107" Machine="127.0.0.1:1" LogGroup="default" />

When the bad application was running, fdbcli could not connect to the cluster - It listed the coordinator as “unreachable”. In the fdbcli logs I saw:

<Event Severity="20" Time="1644266837.015345" DateTime="2022-02-07T20:47:17Z" Type="N2_ConnectHandshakeError" ID="0219362e0161359f" SuppressedEventCount="0" ErrorCode="104" Message="Connection reset by peer" Machine="127.0.0.1:3099" LogGroup="default" />
<Event Severity="10" Time="1644266837.015345" DateTime="2022-02-07T20:47:17Z" Type="ConnectionTimedOut" ID="0219362e0161359f" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500:tls" Machine="127.0.0.1:3099" LogGroup="default" />
<Event Severity="10" Time="1644266837.015345" DateTime="2022-02-07T20:47:17Z" Type="ConnectionClosed" ID="0219362e0161359f" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="127.0.0.1:4500:tls" Machine="127.0.0.1:3099" LogGroup="default" />

The coordinator fdbserver had the same error logs in both cases (when the bad client tried to connect, and also when fdbcli tried to connect while the bad client was running):

<Event Severity="10" Time="1644275302.401044" DateTime="2022-02-07T23:08:22Z" Type="ConnectionFrom" ID="af314dd7cb821090" SuppressedEventCount="0" FromAddress="127.0.0.1:38156:tls" ListenAddress="0.0.0.0:4500:tls" Machine="127.0.0.1:4500" LogGroup="default" Roles="CC,CD,RV" />
<Event Severity="20" Time="1644275302.403588" DateTime="2022-02-07T23:08:22Z" Type="N2_AcceptHandshakeError" ID="0000000000000000" ErrorCode="337100999" ErrorMsg="peer did not return a certificate" BackgroundThread="1" Machine="127.0.0.1:4500" LogGroup="default" Roles="CC,CD,RV" />
<Event Severity="10" Time="1644275302.403611" DateTime="2022-02-07T23:08:22Z" Type="IncomingConnectionError" ID="af314dd7cb821090" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" FromAddress="127.0.0.1:38156:tls" Machine="127.0.0.1:4500" LogGroup="default" Roles="CC,CD,RV" />

Obviously it makes total sense why my go fdb client couldn’t connect, but I don’t understand why this caused fdbcli to not work either. The TLS files used by fdbcli are only used by fdbcli, not touched by the go application. My steps were:

  1. Run fdbcli - coordinator unreachable
  2. Stop go application
  3. Exit fdbcli
  4. Run fdbcli - coordinator reachable

I’m running everything on the same machine (go application, fdbcli, fdbservers) and my cluster has just 1 coordinator.