Bad record mac when trying to use TLS with FoundationDB on Kubernetes

Hi folks,

I’m currently working on getting a deployment of FoundationDB on Kubernetes updated to use TLS, but I’m facing a problem where the database remains unavailable and seemingly decryption is failing on several peers with the following trace messages:

<Event Severity="20" Time="1752494789.352593" DateTime="2025-07-14T12:06:29Z" Type="N2_ReadError" ID="7223bda2958f8423" SuppressedEventCount="0" PeerAddr="172.22.175.145:4500:tls" PeerAddress="172.22.175.145:4500:tls" ErrorCode="336130329" Message="decryption failed or bad record mac (SSL routines, ssl3_get_record)" ThreadID="255555187182270082" Machine="172.22.175.145:88" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-12058051743462299474" />
<Event Severity="20" Time="1752494789.352708" DateTime="2025-07-14T12:06:29Z" Type="N2_ReadError" ID="1a7e49b052c07e46" SuppressedEventCount="0" PeerAddr="172.22.175.145:36550:tls" PeerAddress="172.22.175.145:36550:tls" ErrorCode="336151548" Message="sslv3 alert bad record mac (SSL routines, ssl3_read_bytes)" ThreadID="4397127099531056342" Machine="172.22.175.145:4500" LogGroup="fdb-cluster" Roles="CC" />

I’ve verified that these are actual pod IPs and validated the certificates using openssl locally.
The certificates contain the chain leading up to the CA, which is a self-signed certificate generated by our infrastructure.

Some additional details (and I can provide more if needed):

  • Operator version: 2.8.0. I had to fork this to make a small change to switch the HealthProbeBindAddress to ipv4 but it’s otherwise unchanged.
  • FDB version: 7.3.63
  • useDNSInClusterFile: true
  • enableTls: true

With the following environment variables in all containers (operator and fdb):

{
    name: 'FDB_TLS_CA_FILE',
    value: '/var/fdb-certs/ca.crt',
},
{
    name: 'FDB_TLS_CERTIFICATE_FILE',
    value: '/var/fdb-certs/tls.crt',
},
{
    name: 'FDB_TLS_KEY_FILE',
    value: '/var/fdb-certs/tls.key',
}

If I try to connect to the database using fdbcli even from the cluster controller, I get the following:

[root@fdb-cluster-cluster-controller-2692/foundationdb fdb]# fdbcli
Using cluster file `/var/dynamic-conf/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> status details

Using cluster file `/var/dynamic-conf/fdb.cluster'.

Unable to communicate with the cluster controller at 172.18.160.232:4500:tls to
get status.

Configuration:
  Redundancy mode        - unknown
  Storage engine         - unknown
  Log engine             - unknown
  Encryption at-rest     - disabled
  Coordinators           - unknown
  Usable Regions         - unknown

Cluster:
  FoundationDB processes - unknown
  Zones                  - unknown
  Machines               -
  Machines               - unknown

Data:
  Replication health     - unknown
  Moving data            - unknown
  Sum of key-value sizes - unknown
  Disk space used        - unknown

Operating space:
  Unable to retrieve operating space status

Workload:
  Read rate              - unknown
  Write rate             - unknown
  Transactions started   - unknown
  Transactions committed - unknown
  Conflict rate          - unknown

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:

Coordination servers:
  fdb-cluster-log-16039.fdb-cluster.doris.svc.clusterdomain:4500:tls  (reachable)
  fdb-cluster-storage-65093.fdb-cluster.doris.svc.clusterdomain:4500:tls  (reachable)
  fdb-cluster-storage-76380.fdb-cluster.doris.svc.clusterdomain:4500:tls  (reachable)

Client time: 07/14/25 13:16:41

I’d appreciate any clues or suggestions on what to look out for that might be causing this problem

Adding some additional details after testing for a while with this.

I’ve come to the conclusion that for some reason, the problem only happens when trying to connect to the cluster controller.

If I try to set a key in a random pod and read it from a different pod (even though the database says it’s unavailable), I’m able to get the results back.

I can’t ignore the error because it’s causing operator reconciliation problems, sadly.

Additionally, I tried to test both the network, the pod IPs and the certificates by setting up a server by running openssl both on the cluster controller and a random storage pod and I’m able to connect, validate client and server certificates and send data with a setup like this:

On the controller pod:

openssl s_server -accept 4433 -cert /var/fdb-certs/tls.crt -key /var/fdb-certs/tls.key -CAfile /var/fdb-certs/ca.crt --verify 10

On the storage pod:

openssl s_client -connect {sameIpFdbIsTryingToUse}:4433 -cert /var/fdb-certs/tls.crt -key /var/fdb-certs/tls.key -CAfile /var/fdb-certs/ca.crt

I’ve tried using these same commands to debug the connection to the controller but I end up getting this, which I’m not sure if it’s an actual problem. The handshakes seem to work:

openssl s_client -connect 172.22.246.74:4500 -CAfile /var/fdb-certs/ca.crt -cert /var/fdb-certs/tls.crt -key /var/fdb-certs/tls.key -showcerts -verify_return_error
CONNECTED(00000003)
Can't use SSL_get_servername
depth=1 CN = fdb-cluster-doris-internal
verify return:1
depth=0 CN = fdb-cluster
verify return:1
---
Certificate chain
 0 s:CN = fdb-cluster
   i:CN = fdb-cluster-doris-internal
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Jul 15 16:13:06 2025 GMT; NotAfter: Aug 14 16:13:06 2025 GMT
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
 1 s:CN = fdb-cluster-doris-internal
   i:CN = fdb-cluster-doris-internal
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Jul 10 16:07:01 2025 GMT; NotAfter: Jul  3 16:07:01 2055 GMT
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
---
Server certificate
subject=CN = fdb-cluster
issuer=CN = fdb-cluster-doris-internal
---
No client certificate CA names sent
Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:0x03+0x02:RSA+SHA224:0x01+0x02
Shared Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 2276 bytes and written 2345 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---
---
Post-Handshake New Session Ticket arrived:
SSL-Session:
    Protocol  : TLSv1.3
    Cipher    : TLS_AES_256_GCM_SHA384
    Session-ID: C41CCFBB83A95635D0D9E8016BCC39E2973E8B5787DBA6A77FD181CF34303C9F
    Session-ID-ctx:
    Resumption PSK: FEC2CC3AF47F22DAF39559F92F908CB985595B769B0B8A9B20ED96A7BE3FA6C2E48D9CC661CF61A9586E5B3716AE6F84
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 7200 (seconds)
    TLS session ticket:
    0000 - d9 d2 93 04 72 db bb 2b-d5 9b a9 e1 86 3c a4 ab   ....r..+.....<..
    0010 - bc ec 61 7f dc 37 0c a9-1d 75 c9 16 81 a1 f6 f5   ..a..7...u......
    0020 - d7 e7 4f 2c 5c 6b 31 cb-b8 b0 2a 05 02 77 00 c6   ..O,\k1...*..w..
    0030 - 28 3c c3 c4 65 96 a1 7f-b8 07 2a 47 0e cb 29 f8   (<..e.....*G..).
    0040 - 3d 99 ef f3 a5 c5 de 26-37 5c 3d ee 4b b9 97 5b   =......&7\=.K..[
    0050 - f2 e8 14 30 41 6a be 73-ab 58 af d3 f6 e4 57 e1   ...0Aj.s.X....W.
    0060 - e0 9c c1 98 17 37 92 87-aa a5 1c b4 97 67 15 ac   .....7.......g..
    0070 - ab 27 12 bb 0f a3 cd 57-a6 36 1c 50 a5 54 4d 86   .'.....W.6.P.TM.
    0080 - 20 aa df 39 62 d6 5d c4-da f0 8d c6 31 0d 1d 1a    ..9b.].....1...
    0090 - 0f 21 93 7d 79 12 1a 5b-f3 c2 e0 00 64 20 b8 c9   .!.}y..[....d ..
    00a0 - cd cf 2c eb dd e6 29 1c-75 eb 02 f7 74 62 c5 73   ..,...).u...tb.s
    00b0 - dc 6f c9 46 cc dd 44 47-54 7f 49 ec b8 f1 1e b4   .o.F..DGT.I.....
    00c0 - 82 58 1a 75 56 82 7d cf-c8 a7 07 e4 63 a2 ab 98   .X.uV.}.....c...
    00d0 - de 3d 02 8e fd a3 d3 d2-20 b7 b0 b5 0e eb ba 94   .=...... .......
    00e0 - 49 2e 4e 71 6c 9e 78 37-02 ce 9a 40 48 c7 16 e0   I.Nql.x7...@H...
    00f0 - bc 2a 50 a6 4e 1f ce 36-d4 db 66 75 06 3c 0d 8f   .*P.N..6..fu.<..
    0100 - e9 7f 8c 4b 8a 1d 9d 92-3c 88 34 3b ba 89 76 8a   ...K....<.4;..v.
    0110 - dc a7 f5 77 af 98 c9 4e-cd cd a6 bd 80 5f 6a 2e   ...w...N....._j.
    0120 - ff 87 4e 55 95 84 42 69-d8 ed 1b 5a 10 c7 56 ca   ..NU..Bi...Z..V.
    0130 - b4 16 46 5f 7f 8f c9 8a-64 a8 69 af 12 bd 70 89   ..F_....d.i...p.
    0140 - 01 f4 4f 68 93 b4 4f 1f-d9 ba dc 3d 43 b5 25 ea   ..Oh..O....=C.%.
    0150 - 0b 2a 23 99 9f ce 04 49-99 d9 ba 4c 1a fd e2 4c   .*#....I...L...L
    0160 - b3 71 1d 24 c5 8a 89 c8-f7 8d 87 f3 21 e4 c6 76   .q.$........!..v
    0170 - f3 8b 1f 2d 1d e7 d0 9a-94 22 51 74 fd ef 22 87   ...-....."Qt..".
    0180 - 94 40 5e af 48 fa 44 ca-d0 52 57 0d 1a 1b 0c 94   .@^.H.D..RW.....
    0190 - 6a 87 71 46 ff 4a 0d d3-9c 41 a7 0a ab 10 c3 02   j.qF.J...A......
    01a0 - dc b0 9b 6a 7c 2f 65 6a-59 62 10 c6 7d 38 0c c9   ...j|/ejYb..}8..
    01b0 - c5 a4 63 8b f0 ff f4 25-54 84 88 99 ef 92 aa e6   ..c....%T.......
    01c0 - 62 e6 75 0e 98 7d d2 06-69 d1 43 3b e5 f7 85 e0   b.u..}..i.C;....
    01d0 - dd 4c d1 73 2a cb 63 d2-08 94 28 4f 67 98 c0 36   .L.s*.c...(Og..6
    01e0 - 8d 79 bf c4 f5 8c c3 a4-47 07 d5 b9 18 d2 40 66   .y......G.....@f
    01f0 - d1 b9 46 2b 9b b1 65 dc-14 34 c6 38 cc 67 16 1d   ..F+..e..4.8.g..
    0200 - e4 f1 cf d6 5c 1f 2d 49-02 6a 2d 74 d5 99 2e e7   ....\.-I.j-t....
    0210 - 33 6f e2 18 4d be fb 64-e4 ba aa c2 06 73 b5 40   3o..M..d.....s.@
    0220 - 93 7a 86 ae 27 fe 1f e8-46 77 b6 87 d0 c5 b4 38   .z..'...Fw.....8
    0230 - 95 42 9b 78 0b ef d4 46-2b 1b 18 55 35 23 7e de   .B.x...F+..U5#~.
    0240 - 67 da d6 0f 3c eb 85 84-79 d3 07 cd c9 ef f3 3a   g...<...y......:
    0250 - a2 21 7a e5 1d e2 32 9d-02 47 96 1a 4e 67 7b 94   .!z...2..G..Ng{.
    0260 - 1b e5 45 13 0d 01 cd 36-f7 99 bc 1f 3c d1 d7 01   ..E....6....<...
    0270 - b4 3e 79 ef 11 1b 8f 95-37 65 6c e0 86 1e d0 ca   .>y.....7el.....
    0280 - 45 dc 25 41 ee 55 12 ba-e3 b8 c5 7f 95 4a 9f 56   E.%A.U.......J.V
    0290 - 29 61 85 ac 5c 83 c7 81-7d a6 34 0a e6 ae 77 42   )a..\...}.4...wB
    02a0 - 2f 3d db b9 b0 63 5d 40-b1 b7 94 97 21 f1 08 99   /=...c]@....!...
    02b0 - 81 a0 30 94 e8 5d 64 3a-91 31 95 9e d5 ec ed 7e   ..0..]d:.1.....~
    02c0 - 1e a8 48 8d 78 e4 ea 99-71 f7 a5 15 42 a3 fe fc   ..H.x...q...B...
    02d0 - fe 84 19 55 a6 ca ac cf-07 ec c8 22 ba 34 58 b0   ...U.......".4X.
    02e0 - 78 15 c2 1b e0 ea 65 30-6b c0 08 8d 90 71 ae 08   x.....e0k....q..
    02f0 - 97 f2 70 71 6c b7 15 d7-94 94 75 58 25 ab e3 e0   ..pql.....uX%...
    0300 - 00 ff a0 95 de bf 4a 8c-fc b4 f4 8f b6 59 cd 8e   ......J......Y..
    0310 - 67 3d 60 19 98 e4 08 f2-4f 8e 87 44 ba a7 58 84   g=`.....O..D..X.
    0320 - 89 32 45 56 2b fc 05 da-94 44 ef 66 93 55 c4 79   .2EV+....D.f.U.y
    0330 - 8e 35 74 1e e7 39 c5 90-73 a5 75 85 12 49 ff 43   .5t..9..s.u..I.C
    0340 - 1b 6a 8c 1b 5e cb 8b 8b-0a db 23 56 a4 ed 00 cb   .j..^.....#V....
    0350 - 10 e9 a8 8a 5f 07 ce 09-90 20 ea f8 94 a8 56 b8   ...._.... ....V.
    0360 - 98 d2 6c 9a 83 df 2d ab-6e a4 d3 d9 6c bc b6 95   ..l...-.n...l...
    0370 - b1 60 e1 f9 f4 7d ba 3e-69 58 56 e0 59 86 fd 0e   .`...}.>iXV.Y...
    0380 - 35 db 5a 89 a5 b3 c4 ad-50 96 d9 46 b3 88 a7 77   5.Z.....P..F...w
    0390 - aa 17 a3 84 a2 f8 72 0d-f4 87 ec 39 e4 e7 a5 0d   ......r....9....
    03a0 - 81 5d 9c bd c7 43 66 db-d4 1f 89 01 57 28 97 03   .]...Cf.....W(..
    03b0 - 39 21 d3 4c 9f b2 60 a4-f0 92 0a cc b5 00 03 f7   9!.L..`.........
    03c0 - 7e f3 00 c3 78 e1 4e 73-7c 19 fc eb 56 4d 89 aa   ~...x.Ns|...VM..
    03d0 - 53 c1 49 a5 37 ef c3 15-cb 58 8c 85 c5 d4 4e eb   S.I.7....X....N.
    03e0 - dc 44 2a f4 62 bf 51 66-9f 47 fd 12 fc e3 d0 6d   .D*.b.Qf.G.....m
    03f0 - 54 c5 fd 68 7f a5 78 c9-44 46 6d f9 17 6d a5 76   T..h..x.DFm..m.v
    0400 - 6a ce 4b 32 04 52 d0 58-2e 51 d1 11 d7 9e af 4d   j.K2.R.X.Q.....M

    Start Time: 1752611426
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: no
    Max Early Data: 0
---
read R BLOCK
---
Post-Handshake New Session Ticket arrived:
SSL-Session:
    Protocol  : TLSv1.3
    Cipher    : TLS_AES_256_GCM_SHA384
    Session-ID: 82AD945726B8BF185EB24C0F1773C6694328470185CDEAF72AB930782CDD40D6
    Session-ID-ctx:
    Resumption PSK: 4EB1682D197EDD70FB5EA608CCE674D54E8876E2BB6E767CF57F83C32D5BF568035CDC626DA427F5C574EEDCB567E56A
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 7200 (seconds)
    TLS session ticket:
    0000 - d9 d2 93 04 72 db bb 2b-d5 9b a9 e1 86 3c a4 ab   ....r..+.....<..
    0010 - 4e 63 e0 98 0e 18 17 9f-fa 0a 50 3f 6c 86 16 33   Nc........P?l..3
    0020 - 9e cc cd fb 07 76 f1 72-d4 ad bc 40 bf 4a d8 c1   .....v.r...@.J..
    0030 - 7c f6 d5 9e 89 cd f1 a1-b0 26 69 1b fa 63 b9 02   |........&i..c..
    0040 - af 42 83 c3 49 29 f6 ba-5f 09 e1 b3 f4 64 5c 43   .B..I).._....d\C
    0050 - 36 8a 0c cd 26 4c 44 04-ae 36 3c 3c e7 c6 3a 3a   6...&LD..6<<..::
    0060 - 6e 61 8c b1 b4 76 33 e2-0f 79 ad 4a 2e b6 a8 18   na...v3..y.J....
    0070 - ac 15 1f fd 52 76 72 cc-04 b1 a9 8c b6 80 9c 7e   ....Rvr........~
    0080 - 7a f4 a2 96 f7 90 9d ec-f6 6a b6 5e e3 bc dd d3   z........j.^....
    0090 - 65 7c e4 58 e3 75 fd 86-3f 9d 3f 56 a1 22 68 49   e|.X.u..?.?V."hI
    00a0 - d8 f0 3d 8d fc 86 a6 b4-18 5a 50 d8 89 25 46 1f   ..=......ZP..%F.
    00b0 - be ed be c2 f2 8d 6a 70-0d e7 8c 48 73 96 20 71   ......jp...Hs. q
    00c0 - 3e 6d 85 22 22 15 08 0c-3e 51 b8 23 7e 54 c3 f6   >m.""...>Q.#~T..
    00d0 - bb 15 35 21 4d fb 04 85-db 93 4b 7d da 6f 75 1e   ..5!M.....K}.ou.
    00e0 - 5a 34 f4 14 62 5d 4c 58-b5 26 fd 80 57 29 46 1e   Z4..b]LX.&..W)F.
    00f0 - 51 11 f4 b1 29 98 96 48-26 b1 8f 0c 6d 0f c8 c8   Q...)..H&...m...
    0100 - 74 f4 20 b8 bb 7d b8 5b-ce af 4c 41 c4 c0 a7 47   t. ..}.[..LA...G
    0110 - cc aa 8e 70 b5 17 f6 e1-4f 6c 0d c7 88 df 9c 45   ...p....Ol.....E
    0120 - 3b 32 21 73 d0 76 4c 34-e7 96 f1 64 fe 2e a3 d5   ;2!s.vL4...d....
    0130 - ef 0e 60 fa f9 d5 4e 25-e7 96 c6 fa 2d 10 79 70   ..`...N%....-.yp
    0140 - 2b d9 09 6d 13 e8 1c d7-5c 15 15 f3 12 3e 7c de   +..m....\....>|.
    0150 - 91 cf bf 8b 1b f3 56 aa-c7 e2 40 59 0a 93 4c 49   ......V...@Y..LI
    0160 - eb a8 02 06 8e b9 9c 45-9a ea 09 5d 80 07 96 01   .......E...]....
    0170 - 57 9e f6 7c f9 2e 4e 08-0c d6 6b ad 0f 0c 81 24   W..|..N...k....$
    0180 - dd 60 a6 ee ee 6e 64 c1-32 d0 41 2b ec 38 73 6c   .`...nd.2.A+.8sl
    0190 - f7 83 e3 98 b0 62 c3 53-b9 ed b8 41 60 9f 1d be   .....b.S...A`...
    01a0 - b6 8d 68 05 fc 4e e7 e9-72 85 fd 18 7c ab 2b f1   ..h..N..r...|.+.
    01b0 - 10 be 22 b7 54 b7 07 ff-ab 71 d5 32 70 32 fb b1   ..".T....q.2p2..
    01c0 - 46 ef 60 88 4b aa e2 a7-04 01 e4 38 e1 2a 83 0e   F.`.K......8.*..
    01d0 - dc c7 71 29 e6 f5 fd 24-62 60 7f 4f 86 35 c6 84   ..q)...$b`.O.5..
    01e0 - 3f 14 54 c6 ac a1 94 d5-64 2b ef d5 0f 0f 99 37   ?.T.....d+.....7
    01f0 - ae 3d f7 09 95 ff f4 ea-89 6f c4 2e 2f 72 9e b2   .=.......o../r..
    0200 - ba 5d d2 b9 87 1e 73 35-3a 3b 45 7a 89 3c fd 7e   .]....s5:;Ez.<.~
    0210 - 9f 5d 0d 7e 1a ce 0c 70-30 ba 36 d3 a2 be 62 28   .].~...p0.6...b(
    0220 - b3 b6 b0 bf 84 83 be 83-3e 63 61 28 3c de 67 0b   ........>ca(<.g.
    0230 - 06 cb f5 db 2c 37 2d 08-ba 4c 08 6b b0 b0 68 ad   ....,7-..L.k..h.
    0240 - 85 7e 0d 3d c9 db 3d ff-93 10 fc f4 9a 35 8e 3e   .~.=..=......5.>
    0250 - 23 5e 6a 90 53 48 cf 8b-65 bf ad ee b5 ea 4c 72   #^j.SH..e.....Lr
    0260 - 24 90 37 9b 0a d1 01 92-dd 47 9d 1c b3 50 06 c4   $.7......G...P..
    0270 - 92 ae 93 65 46 4a 57 39-7a cd f6 97 3c 2f a8 58   ...eFJW9z...</.X
    0280 - 85 4e 59 e4 01 56 c3 4f-f9 1a 4d 3e a7 7c 20 99   .NY..V.O..M>.| .
    0290 - 41 91 01 91 44 ed 60 d9-07 b3 f9 9c 0c da ca b9   A...D.`.........
    02a0 - 30 4b 8d 29 38 67 94 ac-3c 10 fc 87 2a fd 35 5f   0K.)8g..<...*.5_
    02b0 - 17 f9 80 2c 65 dc e4 ca-f6 cd d2 4e f3 13 df 91   ...,e......N....
    02c0 - c8 94 80 84 2a 64 e4 44-60 b1 9a bf 60 c6 46 2a   ....*d.D`...`.F*
    02d0 - 35 6f e9 3a da f5 68 2e-3a c9 30 43 74 14 ca 27   5o.:..h.:.0Ct..'
    02e0 - 45 31 19 4f a1 fb 8f 22-ab fd 00 50 01 1f ad 92   E1.O..."...P....
    02f0 - 2d fe ba f8 53 cb a8 89-2a 6e 97 68 9c 5a 80 b5   -...S...*n.h.Z..
    0300 - bc fb 4e 58 22 90 20 84-7b 00 18 9e 72 fe 86 3e   ..NX". .{...r..>
    0310 - b4 f0 e5 6e 69 48 f9 25-38 51 7c 96 d4 a6 3d 4d   ...niH.%8Q|...=M
    0320 - 9e 9a 08 97 5f c0 91 a1-6d 77 58 4b 74 da cf e0   ...._...mwXKt...
    0330 - ec fc 51 99 17 a3 f7 e8-b9 b9 a5 f5 b3 84 53 84   ..Q...........S.
    0340 - 82 af d5 fc a9 d7 4d 99-89 03 ea f1 4d 07 f6 18   ......M.....M...
    0350 - c4 ce a8 27 2d b6 d1 fd-41 8e 4b e2 94 c3 99 97   ...'-...A.K.....
    0360 - 01 75 16 71 b2 dc 8d 1f-ce 6e 9d e9 b3 63 69 a3   .u.q.....n...ci.
    0370 - c7 6b 6c 83 4c 65 8a 40-28 66 94 a4 89 8e 88 c2   .kl.Le.@(f......
    0380 - cf 81 ee 61 46 36 7d 8c-bf cc d1 f5 f6 8a 80 fb   ...aF6}.........
    0390 - 2c ea 5c 7d de 13 ec 26-d8 bc fa 7d a0 19 45 a0   ,.\}...&...}..E.
    03a0 - f3 a3 30 26 e3 b0 d5 5d-28 ec dd 1e 1e d9 34 7a   ..0&...](.....4z
    03b0 - 40 05 65 c3 31 0d 19 94-de f1 90 d2 28 a6 bd 76   @.e.1.......(..v
    03c0 - 53 d4 43 d3 30 9b c7 b4-ef d3 a0 b8 79 fd 2e 90   S.C.0.......y...
    03d0 - 63 d5 71 83 ab e3 91 1f-2f 69 b6 bc c6 68 e4 a2   c.q...../i...h..
    03e0 - 05 ff 28 25 f0 c5 d5 a5-40 5f 67 cb b4 aa 5e 37   ..(%....@_g...^7
    03f0 - 41 7f fb 00 45 ff 4a e6-e2 5c e6 9e 68 ff 71 62   A...E.J..\..h.qb
    0400 - 22 8a 41 d8 9f cf 83 b9-11 73 b7 54 f6 41 9c ad   ".A......s.T.A..

    Start Time: 1752611426
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: no
    Max Early Data: 0
---
read R BLOCK
40CCD85A467F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
*** WARNING: The exit code was: 1 ***

I’ve also tried tracing the exact moment when I try to run fdbcli from a random pod and I get these exact logs for that IP in the controller:

tail -f trace.172.22.246.74.4500.1752596046.fepwcO.0.6.xml | grep "172.22.150.122"
<Event Severity="10" Time="1752610536.278328" DateTime="2025-07-15T20:15:36Z" Type="ConnectionFrom" ID="5a223cc3c3b4b504" SuppressedEventCount="0" FromAddress="172.22.150.122:36260:tls" ListenAddress="172.22.246.74:4500:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.340134" DateTime="2025-07-15T20:15:36Z" Type="ConnectionEstablished" ID="5a223cc3c3b4b504" SuppressedEventCount="0" Peer="172.22.150.122:36260:tls" PeerAddress="172.22.150.122:36260:tls" ConnectionId="0" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.340134" DateTime="2025-07-15T20:15:36Z" Type="NotifyAddressHealthy" ID="0000000000000000" SuppressedEventCount="0" Address="172.22.150.122:36260:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.340134" DateTime="2025-07-15T20:15:36Z" Type="IncomingConnection" ID="5a223cc3c3b4b504" FromAddr="172.22.150.122:36260:tls" CanonicalAddr="172.22.150.122:36260:tls" IsPublic="0" Trusted="1" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="20" Time="1752610536.367504" DateTime="2025-07-15T20:15:36Z" Type="N2_ReadError" ID="5a223cc3c3b4b504" SuppressedEventCount="0" PeerAddr="172.22.150.122:36260:tls" PeerAddress="172.22.150.122:36260:tls" ErrorCode="336151548" Message="sslv3 alert bad record mac (SSL routines, ssl3_read_bytes)" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.367504" DateTime="2025-07-15T20:15:36Z" Type="NotifyAddressFailed" ID="0000000000000000" SuppressedEventCount="0" Address="172.22.150.122:36260:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.367504" DateTime="2025-07-15T20:15:36Z" Type="ConnectionClosed" ID="5a223cc3c3b4b504" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="3" PeerAddr="172.22.150.122:36260:tls" PeerAddress="172.22.150.122:36260:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.367504" DateTime="2025-07-15T20:15:36Z" Type="PeerDestroy" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="172.22.150.122:36260:tls" PeerAddress="172.22.150.122:36260:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610540.318695" DateTime="2025-07-15T20:15:40Z" Type="ConnectionFrom" ID="b324e16fcab7ec4c" SuppressedEventCount="0" FromAddress="172.22.150.122:23922:tls" ListenAddress="172.22.246.74:4500:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="20" Time="1752610540.372360" DateTime="2025-07-15T20:15:40Z" Type="N2_ReadError" ID="b324e16fcab7ec4c" SuppressedEventCount="0" PeerAddr="172.22.150.122:23922:tls" PeerAddress="172.22.150.122:23922:tls" ErrorCode="1" Message="stream truncated" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610540.372360" DateTime="2025-07-15T20:15:40Z" Type="IncomingConnectionError" ID="b324e16fcab7ec4c" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" FromAddress="172.22.150.122:23922:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />

While interestingly, the fdbcli running on the storage node is also complaining about the same problem:

<Event Severity="20" Time="1752604993.099479" DateTime="2025-07-15T18:43:13Z" Type="N2_ReadError" ID="2bb31bbc9cd8401c" SuppressedEventCount="0" PeerAddr="172.22.246.74:4500:tls" PeerAddress="172.22.246.74:4500:tls" ErrorCode="336130329" Message="decryption failed or bad record mac (SSL routines, ssl3_get_record)" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752604993.099479" DateTime="2025-07-15T18:43:13Z" Type="ConnectionClosed" ID="2bb31bbc9cd8401c" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="172.22.246.74:4500:tls" PeerAddress="172.22.246.74:4500:tls" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752604998.045402" DateTime="2025-07-15T18:43:18Z" Type="ProcessMetrics" ID="0000000000000000" Elapsed="4.99813" CPUSeconds="0.020354" MainThreadCPUSeconds="0.016562" UptimeSeconds="5.00002" Memory="563318784" ResidentMemory="25124864" UnusedAllocatedMemory="0" MbpsSent="0.0115627" MbpsReceived="0.0398037" DiskTotalBytes="0" DiskFreeBytes="0" DiskQueueDepth="0" DiskIdleSeconds="0" DiskReads="0" DiskReadSeconds="0" DiskWrites="0" DiskWriteSeconds="0" DiskReadsCount="0" DiskWritesCount="0" DiskWriteSectors="0" DiskReadSectors="0" FileWrites="0" FileReads="0" CacheReadBytes="0" CacheFinds="0" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="0" CacheReads="0" CacheHits="0" CacheMisses="0" CacheEvictions="0" DCID="[not set]" ZoneID="[not set]" MachineID="[not set]" Version="[not set]" AIOSubmitCount="0" AIOCollectCount="0" AIOSubmitLag="0" AIODiskStall="0" CurrentConnections="3" ConnectionsEstablished="0.8003" ConnectionsClosed="0.200075" ConnectionErrors="0" TLSPolicyFailures="0" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" TrackLatestType="Original" />
<Event Severity="20" Time="1752605000.424030" DateTime="2025-07-15T18:43:20Z" Type="N2_ReadError" ID="11d39bbc9a4bd4ac" SuppressedEventCount="0" PeerAddr="172.22.246.74:4500:tls" PeerAddress="172.22.246.74:4500:tls" ErrorCode="336130329" Message="decryption failed or bad record mac (SSL routines, ssl3_get_record)" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752605000.424030" DateTime="2025-07-15T18:43:20Z" Type="ConnectionClosed" ID="11d39bbc9a4bd4ac" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="172.22.246.74:4500:tls" PeerAddress="172.22.246.74:4500:tls" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752605000.424030" DateTime="2025-07-15T18:43:20Z" Type="CLICommandLog" ID="7316d385f672dd1a" Command="status details" IsError="0" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="30" Time="1752605001.962707" DateTime="2025-07-15T18:43:21Z" Type="ErrorSavingCliHistory" ID="0000000000000000" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" Filename="/root/.fdbcli_history" UnixErrorCode="1e" UnixError="Read-only file system" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752605001.962707" DateTime="2025-07-15T18:43:21Z" Type="ProcessMetrics" ID="0000000000000000" Elapsed="4.00872" CPUSeconds="0.011377" MainThreadCPUSeconds="0.00858" UptimeSeconds="8.91732" Memory="563298304" ResidentMemory="25825280" UnusedAllocatedMemory="519328" MbpsSent="0.00334471" MbpsReceived="0.0315791" DiskTotalBytes="0" DiskFreeBytes="0" DiskQueueDepth="0" DiskIdleSeconds="0" DiskReads="0" DiskReadSeconds="0" DiskWrites="0" DiskWriteSeconds="0" DiskReadsCount="0" DiskWritesCount="0" DiskWriteSectors="0" DiskReadSectors="0" FileWrites="0" FileReads="0" CacheReadBytes="0" CacheFinds="0" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="0" CacheReads="0" CacheHits="0" CacheMisses="0" CacheEvictions="0" DCID="[not set]" ZoneID="[not set]" MachineID="[not set]" Version="[not set]" AIOSubmitCount="0" AIOCollectCount="0" AIOSubmitLag="0" AIODiskStall="0" CurrentConnections="3" ConnectionsEstablished="0.249456" ConnectionsClosed="0.249456" ConnectionErrors="0" TLSPolicyFailures="0" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" TrackLatestType="Original" />

There seems to be some kind of issue specifically in getting this status because surprisingly everything else is seemingly working.

There is most likely a networking issue that is messing up the TLS stream. A bad MAC should not happen in any circumstances, unless your CPUs have a bug, or your network link is extremely unreliable.

My leading theory right now is that there’s something in the environment that elicits a bugged behavior on fdbli.

I’ve inspected the interface and there’s barely any dropped packets and no records are truncated.

We also have all of our other databases, workers and services running in this cluster with no issues (and not to mention the fact that fdb works fine without TLS too).

It’s also very curious that only getting the status fails. Configuring the database, setting and reading keys all work, even if the CLI says the database is unavailable.

From what I can see when running tcpdump for both sides, the client is sending this alert to the server after the handshake completes successfully.

I’m going to try to deploy this into a separate vanilla environment to see if it’s a rare condition that triggers it.

I don’t know the details of your Kubernetes setup but you should check your host network configuration and the configuration of the CNI (the networking component for Kubernetes). Those errors are usually caused by a mismatch in the MTU configuration, so that part of the packet is dropped.

Hi Johannes, thanks for the reply.

Would mind elaborating on what you mean by mismatch for the MUT configuration? Between the CNI and the host?

I don’t manage this cluster, but we run tons of workloads here including several databases (all of them with TLS) and afaik we haven’t seen something like this before.

I had suspected record truncation initially based on that error message but I didn’t see any truncated buffer issues in the interfaces or dropped packets.

I’ve narrowed the error to be caused by the fdbcli by now. The handshake succeeds but after a few encrypted records, the cli sends this alert to the server and restarts the connection.

I was even able to connect reliably to the database by compiling the cli locally for ARM on 7.1.67 and connecting to the kubernetes cluster, so it indeed looks like a problem when running fdbcli from those pods (including the operator, sadly).

The error also appears to be intermittent because around 10% of the client connections we attempt succeed (anecdotally).

Would the status TLS records be particularly large?

Additional note from the team that manages the cluster:

We don’t use overlay networking for our kubernetes setup so there are no weird MTU nesting issues. Each pod is directly attached to the EC2 network via ipvlan and the MTU is 9001 across the board.