Bad record mac when trying to use TLS with FoundationDB on Kubernetes

Hi folks,

I’m currently working on getting a deployment of FoundationDB on Kubernetes updated to use TLS, but I’m facing a problem where the database remains unavailable and seemingly decryption is failing on several peers with the following trace messages:

<Event Severity="20" Time="1752494789.352593" DateTime="2025-07-14T12:06:29Z" Type="N2_ReadError" ID="7223bda2958f8423" SuppressedEventCount="0" PeerAddr="172.22.175.145:4500:tls" PeerAddress="172.22.175.145:4500:tls" ErrorCode="336130329" Message="decryption failed or bad record mac (SSL routines, ssl3_get_record)" ThreadID="255555187182270082" Machine="172.22.175.145:88" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-12058051743462299474" />
<Event Severity="20" Time="1752494789.352708" DateTime="2025-07-14T12:06:29Z" Type="N2_ReadError" ID="1a7e49b052c07e46" SuppressedEventCount="0" PeerAddr="172.22.175.145:36550:tls" PeerAddress="172.22.175.145:36550:tls" ErrorCode="336151548" Message="sslv3 alert bad record mac (SSL routines, ssl3_read_bytes)" ThreadID="4397127099531056342" Machine="172.22.175.145:4500" LogGroup="fdb-cluster" Roles="CC" />

I’ve verified that these are actual pod IPs and validated the certificates using openssl locally.
The certificates contain the chain leading up to the CA, which is a self-signed certificate generated by our infrastructure.

Some additional details (and I can provide more if needed):

  • Operator version: 2.8.0. I had to fork this to make a small change to switch the HealthProbeBindAddress to ipv4 but it’s otherwise unchanged.
  • FDB version: 7.3.63
  • useDNSInClusterFile: true
  • enableTls: true

With the following environment variables in all containers (operator and fdb):

{
    name: 'FDB_TLS_CA_FILE',
    value: '/var/fdb-certs/ca.crt',
},
{
    name: 'FDB_TLS_CERTIFICATE_FILE',
    value: '/var/fdb-certs/tls.crt',
},
{
    name: 'FDB_TLS_KEY_FILE',
    value: '/var/fdb-certs/tls.key',
}

If I try to connect to the database using fdbcli even from the cluster controller, I get the following:

[root@fdb-cluster-cluster-controller-2692/foundationdb fdb]# fdbcli
Using cluster file `/var/dynamic-conf/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> status details

Using cluster file `/var/dynamic-conf/fdb.cluster'.

Unable to communicate with the cluster controller at 172.18.160.232:4500:tls to
get status.

Configuration:
  Redundancy mode        - unknown
  Storage engine         - unknown
  Log engine             - unknown
  Encryption at-rest     - disabled
  Coordinators           - unknown
  Usable Regions         - unknown

Cluster:
  FoundationDB processes - unknown
  Zones                  - unknown
  Machines               -
  Machines               - unknown

Data:
  Replication health     - unknown
  Moving data            - unknown
  Sum of key-value sizes - unknown
  Disk space used        - unknown

Operating space:
  Unable to retrieve operating space status

Workload:
  Read rate              - unknown
  Write rate             - unknown
  Transactions started   - unknown
  Transactions committed - unknown
  Conflict rate          - unknown

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:

Coordination servers:
  fdb-cluster-log-16039.fdb-cluster.doris.svc.clusterdomain:4500:tls  (reachable)
  fdb-cluster-storage-65093.fdb-cluster.doris.svc.clusterdomain:4500:tls  (reachable)
  fdb-cluster-storage-76380.fdb-cluster.doris.svc.clusterdomain:4500:tls  (reachable)

Client time: 07/14/25 13:16:41

I’d appreciate any clues or suggestions on what to look out for that might be causing this problem

Adding some additional details after testing for a while with this.

I’ve come to the conclusion that for some reason, the problem only happens when trying to connect to the cluster controller.

If I try to set a key in a random pod and read it from a different pod (even though the database says it’s unavailable), I’m able to get the results back.

I can’t ignore the error because it’s causing operator reconciliation problems, sadly.

Additionally, I tried to test both the network, the pod IPs and the certificates by setting up a server by running openssl both on the cluster controller and a random storage pod and I’m able to connect, validate client and server certificates and send data with a setup like this:

On the controller pod:

openssl s_server -accept 4433 -cert /var/fdb-certs/tls.crt -key /var/fdb-certs/tls.key -CAfile /var/fdb-certs/ca.crt --verify 10

On the storage pod:

openssl s_client -connect {sameIpFdbIsTryingToUse}:4433 -cert /var/fdb-certs/tls.crt -key /var/fdb-certs/tls.key -CAfile /var/fdb-certs/ca.crt

I’ve tried using these same commands to debug the connection to the controller but I end up getting this, which I’m not sure if it’s an actual problem. The handshakes seem to work:

openssl s_client -connect 172.22.246.74:4500 -CAfile /var/fdb-certs/ca.crt -cert /var/fdb-certs/tls.crt -key /var/fdb-certs/tls.key -showcerts -verify_return_error
CONNECTED(00000003)
Can't use SSL_get_servername
depth=1 CN = fdb-cluster-doris-internal
verify return:1
depth=0 CN = fdb-cluster
verify return:1
---
Certificate chain
 0 s:CN = fdb-cluster
   i:CN = fdb-cluster-doris-internal
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Jul 15 16:13:06 2025 GMT; NotAfter: Aug 14 16:13:06 2025 GMT
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
 1 s:CN = fdb-cluster-doris-internal
   i:CN = fdb-cluster-doris-internal
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Jul 10 16:07:01 2025 GMT; NotAfter: Jul  3 16:07:01 2055 GMT
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
---
Server certificate
subject=CN = fdb-cluster
issuer=CN = fdb-cluster-doris-internal
---
No client certificate CA names sent
Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:0x03+0x02:RSA+SHA224:0x01+0x02
Shared Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 2276 bytes and written 2345 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---
---
Post-Handshake New Session Ticket arrived:
SSL-Session:
    Protocol  : TLSv1.3
    Cipher    : TLS_AES_256_GCM_SHA384
    Session-ID: C41CCFBB83A95635D0D9E8016BCC39E2973E8B5787DBA6A77FD181CF34303C9F
    Session-ID-ctx:
    Resumption PSK: FEC2CC3AF47F22DAF39559F92F908CB985595B769B0B8A9B20ED96A7BE3FA6C2E48D9CC661CF61A9586E5B3716AE6F84
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 7200 (seconds)
    TLS session ticket:
    0000 - d9 d2 93 04 72 db bb 2b-d5 9b a9 e1 86 3c a4 ab   ....r..+.....<..
    0010 - bc ec 61 7f dc 37 0c a9-1d 75 c9 16 81 a1 f6 f5   ..a..7...u......
    0020 - d7 e7 4f 2c 5c 6b 31 cb-b8 b0 2a 05 02 77 00 c6   ..O,\k1...*..w..
    0030 - 28 3c c3 c4 65 96 a1 7f-b8 07 2a 47 0e cb 29 f8   (<..e.....*G..).
    0040 - 3d 99 ef f3 a5 c5 de 26-37 5c 3d ee 4b b9 97 5b   =......&7\=.K..[
    0050 - f2 e8 14 30 41 6a be 73-ab 58 af d3 f6 e4 57 e1   ...0Aj.s.X....W.
    0060 - e0 9c c1 98 17 37 92 87-aa a5 1c b4 97 67 15 ac   .....7.......g..
    0070 - ab 27 12 bb 0f a3 cd 57-a6 36 1c 50 a5 54 4d 86   .'.....W.6.P.TM.
    0080 - 20 aa df 39 62 d6 5d c4-da f0 8d c6 31 0d 1d 1a    ..9b.].....1...
    0090 - 0f 21 93 7d 79 12 1a 5b-f3 c2 e0 00 64 20 b8 c9   .!.}y..[....d ..
    00a0 - cd cf 2c eb dd e6 29 1c-75 eb 02 f7 74 62 c5 73   ..,...).u...tb.s
    00b0 - dc 6f c9 46 cc dd 44 47-54 7f 49 ec b8 f1 1e b4   .o.F..DGT.I.....
    00c0 - 82 58 1a 75 56 82 7d cf-c8 a7 07 e4 63 a2 ab 98   .X.uV.}.....c...
    00d0 - de 3d 02 8e fd a3 d3 d2-20 b7 b0 b5 0e eb ba 94   .=...... .......
    00e0 - 49 2e 4e 71 6c 9e 78 37-02 ce 9a 40 48 c7 16 e0   I.Nql.x7...@H...
    00f0 - bc 2a 50 a6 4e 1f ce 36-d4 db 66 75 06 3c 0d 8f   .*P.N..6..fu.<..
    0100 - e9 7f 8c 4b 8a 1d 9d 92-3c 88 34 3b ba 89 76 8a   ...K....<.4;..v.
    0110 - dc a7 f5 77 af 98 c9 4e-cd cd a6 bd 80 5f 6a 2e   ...w...N....._j.
    0120 - ff 87 4e 55 95 84 42 69-d8 ed 1b 5a 10 c7 56 ca   ..NU..Bi...Z..V.
    0130 - b4 16 46 5f 7f 8f c9 8a-64 a8 69 af 12 bd 70 89   ..F_....d.i...p.
    0140 - 01 f4 4f 68 93 b4 4f 1f-d9 ba dc 3d 43 b5 25 ea   ..Oh..O....=C.%.
    0150 - 0b 2a 23 99 9f ce 04 49-99 d9 ba 4c 1a fd e2 4c   .*#....I...L...L
    0160 - b3 71 1d 24 c5 8a 89 c8-f7 8d 87 f3 21 e4 c6 76   .q.$........!..v
    0170 - f3 8b 1f 2d 1d e7 d0 9a-94 22 51 74 fd ef 22 87   ...-....."Qt..".
    0180 - 94 40 5e af 48 fa 44 ca-d0 52 57 0d 1a 1b 0c 94   .@^.H.D..RW.....
    0190 - 6a 87 71 46 ff 4a 0d d3-9c 41 a7 0a ab 10 c3 02   j.qF.J...A......
    01a0 - dc b0 9b 6a 7c 2f 65 6a-59 62 10 c6 7d 38 0c c9   ...j|/ejYb..}8..
    01b0 - c5 a4 63 8b f0 ff f4 25-54 84 88 99 ef 92 aa e6   ..c....%T.......
    01c0 - 62 e6 75 0e 98 7d d2 06-69 d1 43 3b e5 f7 85 e0   b.u..}..i.C;....
    01d0 - dd 4c d1 73 2a cb 63 d2-08 94 28 4f 67 98 c0 36   .L.s*.c...(Og..6
    01e0 - 8d 79 bf c4 f5 8c c3 a4-47 07 d5 b9 18 d2 40 66   .y......G.....@f
    01f0 - d1 b9 46 2b 9b b1 65 dc-14 34 c6 38 cc 67 16 1d   ..F+..e..4.8.g..
    0200 - e4 f1 cf d6 5c 1f 2d 49-02 6a 2d 74 d5 99 2e e7   ....\.-I.j-t....
    0210 - 33 6f e2 18 4d be fb 64-e4 ba aa c2 06 73 b5 40   3o..M..d.....s.@
    0220 - 93 7a 86 ae 27 fe 1f e8-46 77 b6 87 d0 c5 b4 38   .z..'...Fw.....8
    0230 - 95 42 9b 78 0b ef d4 46-2b 1b 18 55 35 23 7e de   .B.x...F+..U5#~.
    0240 - 67 da d6 0f 3c eb 85 84-79 d3 07 cd c9 ef f3 3a   g...<...y......:
    0250 - a2 21 7a e5 1d e2 32 9d-02 47 96 1a 4e 67 7b 94   .!z...2..G..Ng{.
    0260 - 1b e5 45 13 0d 01 cd 36-f7 99 bc 1f 3c d1 d7 01   ..E....6....<...
    0270 - b4 3e 79 ef 11 1b 8f 95-37 65 6c e0 86 1e d0 ca   .>y.....7el.....
    0280 - 45 dc 25 41 ee 55 12 ba-e3 b8 c5 7f 95 4a 9f 56   E.%A.U.......J.V
    0290 - 29 61 85 ac 5c 83 c7 81-7d a6 34 0a e6 ae 77 42   )a..\...}.4...wB
    02a0 - 2f 3d db b9 b0 63 5d 40-b1 b7 94 97 21 f1 08 99   /=...c]@....!...
    02b0 - 81 a0 30 94 e8 5d 64 3a-91 31 95 9e d5 ec ed 7e   ..0..]d:.1.....~
    02c0 - 1e a8 48 8d 78 e4 ea 99-71 f7 a5 15 42 a3 fe fc   ..H.x...q...B...
    02d0 - fe 84 19 55 a6 ca ac cf-07 ec c8 22 ba 34 58 b0   ...U.......".4X.
    02e0 - 78 15 c2 1b e0 ea 65 30-6b c0 08 8d 90 71 ae 08   x.....e0k....q..
    02f0 - 97 f2 70 71 6c b7 15 d7-94 94 75 58 25 ab e3 e0   ..pql.....uX%...
    0300 - 00 ff a0 95 de bf 4a 8c-fc b4 f4 8f b6 59 cd 8e   ......J......Y..
    0310 - 67 3d 60 19 98 e4 08 f2-4f 8e 87 44 ba a7 58 84   g=`.....O..D..X.
    0320 - 89 32 45 56 2b fc 05 da-94 44 ef 66 93 55 c4 79   .2EV+....D.f.U.y
    0330 - 8e 35 74 1e e7 39 c5 90-73 a5 75 85 12 49 ff 43   .5t..9..s.u..I.C
    0340 - 1b 6a 8c 1b 5e cb 8b 8b-0a db 23 56 a4 ed 00 cb   .j..^.....#V....
    0350 - 10 e9 a8 8a 5f 07 ce 09-90 20 ea f8 94 a8 56 b8   ...._.... ....V.
    0360 - 98 d2 6c 9a 83 df 2d ab-6e a4 d3 d9 6c bc b6 95   ..l...-.n...l...
    0370 - b1 60 e1 f9 f4 7d ba 3e-69 58 56 e0 59 86 fd 0e   .`...}.>iXV.Y...
    0380 - 35 db 5a 89 a5 b3 c4 ad-50 96 d9 46 b3 88 a7 77   5.Z.....P..F...w
    0390 - aa 17 a3 84 a2 f8 72 0d-f4 87 ec 39 e4 e7 a5 0d   ......r....9....
    03a0 - 81 5d 9c bd c7 43 66 db-d4 1f 89 01 57 28 97 03   .]...Cf.....W(..
    03b0 - 39 21 d3 4c 9f b2 60 a4-f0 92 0a cc b5 00 03 f7   9!.L..`.........
    03c0 - 7e f3 00 c3 78 e1 4e 73-7c 19 fc eb 56 4d 89 aa   ~...x.Ns|...VM..
    03d0 - 53 c1 49 a5 37 ef c3 15-cb 58 8c 85 c5 d4 4e eb   S.I.7....X....N.
    03e0 - dc 44 2a f4 62 bf 51 66-9f 47 fd 12 fc e3 d0 6d   .D*.b.Qf.G.....m
    03f0 - 54 c5 fd 68 7f a5 78 c9-44 46 6d f9 17 6d a5 76   T..h..x.DFm..m.v
    0400 - 6a ce 4b 32 04 52 d0 58-2e 51 d1 11 d7 9e af 4d   j.K2.R.X.Q.....M

    Start Time: 1752611426
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: no
    Max Early Data: 0
---
read R BLOCK
---
Post-Handshake New Session Ticket arrived:
SSL-Session:
    Protocol  : TLSv1.3
    Cipher    : TLS_AES_256_GCM_SHA384
    Session-ID: 82AD945726B8BF185EB24C0F1773C6694328470185CDEAF72AB930782CDD40D6
    Session-ID-ctx:
    Resumption PSK: 4EB1682D197EDD70FB5EA608CCE674D54E8876E2BB6E767CF57F83C32D5BF568035CDC626DA427F5C574EEDCB567E56A
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 7200 (seconds)
    TLS session ticket:
    0000 - d9 d2 93 04 72 db bb 2b-d5 9b a9 e1 86 3c a4 ab   ....r..+.....<..
    0010 - 4e 63 e0 98 0e 18 17 9f-fa 0a 50 3f 6c 86 16 33   Nc........P?l..3
    0020 - 9e cc cd fb 07 76 f1 72-d4 ad bc 40 bf 4a d8 c1   .....v.r...@.J..
    0030 - 7c f6 d5 9e 89 cd f1 a1-b0 26 69 1b fa 63 b9 02   |........&i..c..
    0040 - af 42 83 c3 49 29 f6 ba-5f 09 e1 b3 f4 64 5c 43   .B..I).._....d\C
    0050 - 36 8a 0c cd 26 4c 44 04-ae 36 3c 3c e7 c6 3a 3a   6...&LD..6<<..::
    0060 - 6e 61 8c b1 b4 76 33 e2-0f 79 ad 4a 2e b6 a8 18   na...v3..y.J....
    0070 - ac 15 1f fd 52 76 72 cc-04 b1 a9 8c b6 80 9c 7e   ....Rvr........~
    0080 - 7a f4 a2 96 f7 90 9d ec-f6 6a b6 5e e3 bc dd d3   z........j.^....
    0090 - 65 7c e4 58 e3 75 fd 86-3f 9d 3f 56 a1 22 68 49   e|.X.u..?.?V."hI
    00a0 - d8 f0 3d 8d fc 86 a6 b4-18 5a 50 d8 89 25 46 1f   ..=......ZP..%F.
    00b0 - be ed be c2 f2 8d 6a 70-0d e7 8c 48 73 96 20 71   ......jp...Hs. q
    00c0 - 3e 6d 85 22 22 15 08 0c-3e 51 b8 23 7e 54 c3 f6   >m.""...>Q.#~T..
    00d0 - bb 15 35 21 4d fb 04 85-db 93 4b 7d da 6f 75 1e   ..5!M.....K}.ou.
    00e0 - 5a 34 f4 14 62 5d 4c 58-b5 26 fd 80 57 29 46 1e   Z4..b]LX.&..W)F.
    00f0 - 51 11 f4 b1 29 98 96 48-26 b1 8f 0c 6d 0f c8 c8   Q...)..H&...m...
    0100 - 74 f4 20 b8 bb 7d b8 5b-ce af 4c 41 c4 c0 a7 47   t. ..}.[..LA...G
    0110 - cc aa 8e 70 b5 17 f6 e1-4f 6c 0d c7 88 df 9c 45   ...p....Ol.....E
    0120 - 3b 32 21 73 d0 76 4c 34-e7 96 f1 64 fe 2e a3 d5   ;2!s.vL4...d....
    0130 - ef 0e 60 fa f9 d5 4e 25-e7 96 c6 fa 2d 10 79 70   ..`...N%....-.yp
    0140 - 2b d9 09 6d 13 e8 1c d7-5c 15 15 f3 12 3e 7c de   +..m....\....>|.
    0150 - 91 cf bf 8b 1b f3 56 aa-c7 e2 40 59 0a 93 4c 49   ......V...@Y..LI
    0160 - eb a8 02 06 8e b9 9c 45-9a ea 09 5d 80 07 96 01   .......E...]....
    0170 - 57 9e f6 7c f9 2e 4e 08-0c d6 6b ad 0f 0c 81 24   W..|..N...k....$
    0180 - dd 60 a6 ee ee 6e 64 c1-32 d0 41 2b ec 38 73 6c   .`...nd.2.A+.8sl
    0190 - f7 83 e3 98 b0 62 c3 53-b9 ed b8 41 60 9f 1d be   .....b.S...A`...
    01a0 - b6 8d 68 05 fc 4e e7 e9-72 85 fd 18 7c ab 2b f1   ..h..N..r...|.+.
    01b0 - 10 be 22 b7 54 b7 07 ff-ab 71 d5 32 70 32 fb b1   ..".T....q.2p2..
    01c0 - 46 ef 60 88 4b aa e2 a7-04 01 e4 38 e1 2a 83 0e   F.`.K......8.*..
    01d0 - dc c7 71 29 e6 f5 fd 24-62 60 7f 4f 86 35 c6 84   ..q)...$b`.O.5..
    01e0 - 3f 14 54 c6 ac a1 94 d5-64 2b ef d5 0f 0f 99 37   ?.T.....d+.....7
    01f0 - ae 3d f7 09 95 ff f4 ea-89 6f c4 2e 2f 72 9e b2   .=.......o../r..
    0200 - ba 5d d2 b9 87 1e 73 35-3a 3b 45 7a 89 3c fd 7e   .]....s5:;Ez.<.~
    0210 - 9f 5d 0d 7e 1a ce 0c 70-30 ba 36 d3 a2 be 62 28   .].~...p0.6...b(
    0220 - b3 b6 b0 bf 84 83 be 83-3e 63 61 28 3c de 67 0b   ........>ca(<.g.
    0230 - 06 cb f5 db 2c 37 2d 08-ba 4c 08 6b b0 b0 68 ad   ....,7-..L.k..h.
    0240 - 85 7e 0d 3d c9 db 3d ff-93 10 fc f4 9a 35 8e 3e   .~.=..=......5.>
    0250 - 23 5e 6a 90 53 48 cf 8b-65 bf ad ee b5 ea 4c 72   #^j.SH..e.....Lr
    0260 - 24 90 37 9b 0a d1 01 92-dd 47 9d 1c b3 50 06 c4   $.7......G...P..
    0270 - 92 ae 93 65 46 4a 57 39-7a cd f6 97 3c 2f a8 58   ...eFJW9z...</.X
    0280 - 85 4e 59 e4 01 56 c3 4f-f9 1a 4d 3e a7 7c 20 99   .NY..V.O..M>.| .
    0290 - 41 91 01 91 44 ed 60 d9-07 b3 f9 9c 0c da ca b9   A...D.`.........
    02a0 - 30 4b 8d 29 38 67 94 ac-3c 10 fc 87 2a fd 35 5f   0K.)8g..<...*.5_
    02b0 - 17 f9 80 2c 65 dc e4 ca-f6 cd d2 4e f3 13 df 91   ...,e......N....
    02c0 - c8 94 80 84 2a 64 e4 44-60 b1 9a bf 60 c6 46 2a   ....*d.D`...`.F*
    02d0 - 35 6f e9 3a da f5 68 2e-3a c9 30 43 74 14 ca 27   5o.:..h.:.0Ct..'
    02e0 - 45 31 19 4f a1 fb 8f 22-ab fd 00 50 01 1f ad 92   E1.O..."...P....
    02f0 - 2d fe ba f8 53 cb a8 89-2a 6e 97 68 9c 5a 80 b5   -...S...*n.h.Z..
    0300 - bc fb 4e 58 22 90 20 84-7b 00 18 9e 72 fe 86 3e   ..NX". .{...r..>
    0310 - b4 f0 e5 6e 69 48 f9 25-38 51 7c 96 d4 a6 3d 4d   ...niH.%8Q|...=M
    0320 - 9e 9a 08 97 5f c0 91 a1-6d 77 58 4b 74 da cf e0   ...._...mwXKt...
    0330 - ec fc 51 99 17 a3 f7 e8-b9 b9 a5 f5 b3 84 53 84   ..Q...........S.
    0340 - 82 af d5 fc a9 d7 4d 99-89 03 ea f1 4d 07 f6 18   ......M.....M...
    0350 - c4 ce a8 27 2d b6 d1 fd-41 8e 4b e2 94 c3 99 97   ...'-...A.K.....
    0360 - 01 75 16 71 b2 dc 8d 1f-ce 6e 9d e9 b3 63 69 a3   .u.q.....n...ci.
    0370 - c7 6b 6c 83 4c 65 8a 40-28 66 94 a4 89 8e 88 c2   .kl.Le.@(f......
    0380 - cf 81 ee 61 46 36 7d 8c-bf cc d1 f5 f6 8a 80 fb   ...aF6}.........
    0390 - 2c ea 5c 7d de 13 ec 26-d8 bc fa 7d a0 19 45 a0   ,.\}...&...}..E.
    03a0 - f3 a3 30 26 e3 b0 d5 5d-28 ec dd 1e 1e d9 34 7a   ..0&...](.....4z
    03b0 - 40 05 65 c3 31 0d 19 94-de f1 90 d2 28 a6 bd 76   @.e.1.......(..v
    03c0 - 53 d4 43 d3 30 9b c7 b4-ef d3 a0 b8 79 fd 2e 90   S.C.0.......y...
    03d0 - 63 d5 71 83 ab e3 91 1f-2f 69 b6 bc c6 68 e4 a2   c.q...../i...h..
    03e0 - 05 ff 28 25 f0 c5 d5 a5-40 5f 67 cb b4 aa 5e 37   ..(%....@_g...^7
    03f0 - 41 7f fb 00 45 ff 4a e6-e2 5c e6 9e 68 ff 71 62   A...E.J..\..h.qb
    0400 - 22 8a 41 d8 9f cf 83 b9-11 73 b7 54 f6 41 9c ad   ".A......s.T.A..

    Start Time: 1752611426
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: no
    Max Early Data: 0
---
read R BLOCK
40CCD85A467F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
*** WARNING: The exit code was: 1 ***

I’ve also tried tracing the exact moment when I try to run fdbcli from a random pod and I get these exact logs for that IP in the controller:

tail -f trace.172.22.246.74.4500.1752596046.fepwcO.0.6.xml | grep "172.22.150.122"
<Event Severity="10" Time="1752610536.278328" DateTime="2025-07-15T20:15:36Z" Type="ConnectionFrom" ID="5a223cc3c3b4b504" SuppressedEventCount="0" FromAddress="172.22.150.122:36260:tls" ListenAddress="172.22.246.74:4500:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.340134" DateTime="2025-07-15T20:15:36Z" Type="ConnectionEstablished" ID="5a223cc3c3b4b504" SuppressedEventCount="0" Peer="172.22.150.122:36260:tls" PeerAddress="172.22.150.122:36260:tls" ConnectionId="0" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.340134" DateTime="2025-07-15T20:15:36Z" Type="NotifyAddressHealthy" ID="0000000000000000" SuppressedEventCount="0" Address="172.22.150.122:36260:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.340134" DateTime="2025-07-15T20:15:36Z" Type="IncomingConnection" ID="5a223cc3c3b4b504" FromAddr="172.22.150.122:36260:tls" CanonicalAddr="172.22.150.122:36260:tls" IsPublic="0" Trusted="1" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="20" Time="1752610536.367504" DateTime="2025-07-15T20:15:36Z" Type="N2_ReadError" ID="5a223cc3c3b4b504" SuppressedEventCount="0" PeerAddr="172.22.150.122:36260:tls" PeerAddress="172.22.150.122:36260:tls" ErrorCode="336151548" Message="sslv3 alert bad record mac (SSL routines, ssl3_read_bytes)" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.367504" DateTime="2025-07-15T20:15:36Z" Type="NotifyAddressFailed" ID="0000000000000000" SuppressedEventCount="0" Address="172.22.150.122:36260:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.367504" DateTime="2025-07-15T20:15:36Z" Type="ConnectionClosed" ID="5a223cc3c3b4b504" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="3" PeerAddr="172.22.150.122:36260:tls" PeerAddress="172.22.150.122:36260:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610536.367504" DateTime="2025-07-15T20:15:36Z" Type="PeerDestroy" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="172.22.150.122:36260:tls" PeerAddress="172.22.150.122:36260:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610540.318695" DateTime="2025-07-15T20:15:40Z" Type="ConnectionFrom" ID="b324e16fcab7ec4c" SuppressedEventCount="0" FromAddress="172.22.150.122:23922:tls" ListenAddress="172.22.246.74:4500:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="20" Time="1752610540.372360" DateTime="2025-07-15T20:15:40Z" Type="N2_ReadError" ID="b324e16fcab7ec4c" SuppressedEventCount="0" PeerAddr="172.22.150.122:23922:tls" PeerAddress="172.22.150.122:23922:tls" ErrorCode="1" Message="stream truncated" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />
<Event Severity="10" Time="1752610540.372360" DateTime="2025-07-15T20:15:40Z" Type="IncomingConnectionError" ID="b324e16fcab7ec4c" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" FromAddress="172.22.150.122:23922:tls" ThreadID="295456964672098457" Machine="172.22.246.74:4500" LogGroup="fdb-cluster" Roles="CC" />

While interestingly, the fdbcli running on the storage node is also complaining about the same problem:

<Event Severity="20" Time="1752604993.099479" DateTime="2025-07-15T18:43:13Z" Type="N2_ReadError" ID="2bb31bbc9cd8401c" SuppressedEventCount="0" PeerAddr="172.22.246.74:4500:tls" PeerAddress="172.22.246.74:4500:tls" ErrorCode="336130329" Message="decryption failed or bad record mac (SSL routines, ssl3_get_record)" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752604993.099479" DateTime="2025-07-15T18:43:13Z" Type="ConnectionClosed" ID="2bb31bbc9cd8401c" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="172.22.246.74:4500:tls" PeerAddress="172.22.246.74:4500:tls" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752604998.045402" DateTime="2025-07-15T18:43:18Z" Type="ProcessMetrics" ID="0000000000000000" Elapsed="4.99813" CPUSeconds="0.020354" MainThreadCPUSeconds="0.016562" UptimeSeconds="5.00002" Memory="563318784" ResidentMemory="25124864" UnusedAllocatedMemory="0" MbpsSent="0.0115627" MbpsReceived="0.0398037" DiskTotalBytes="0" DiskFreeBytes="0" DiskQueueDepth="0" DiskIdleSeconds="0" DiskReads="0" DiskReadSeconds="0" DiskWrites="0" DiskWriteSeconds="0" DiskReadsCount="0" DiskWritesCount="0" DiskWriteSectors="0" DiskReadSectors="0" FileWrites="0" FileReads="0" CacheReadBytes="0" CacheFinds="0" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="0" CacheReads="0" CacheHits="0" CacheMisses="0" CacheEvictions="0" DCID="[not set]" ZoneID="[not set]" MachineID="[not set]" Version="[not set]" AIOSubmitCount="0" AIOCollectCount="0" AIOSubmitLag="0" AIODiskStall="0" CurrentConnections="3" ConnectionsEstablished="0.8003" ConnectionsClosed="0.200075" ConnectionErrors="0" TLSPolicyFailures="0" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" TrackLatestType="Original" />
<Event Severity="20" Time="1752605000.424030" DateTime="2025-07-15T18:43:20Z" Type="N2_ReadError" ID="11d39bbc9a4bd4ac" SuppressedEventCount="0" PeerAddr="172.22.246.74:4500:tls" PeerAddress="172.22.246.74:4500:tls" ErrorCode="336130329" Message="decryption failed or bad record mac (SSL routines, ssl3_get_record)" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752605000.424030" DateTime="2025-07-15T18:43:20Z" Type="ConnectionClosed" ID="11d39bbc9a4bd4ac" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="172.22.246.74:4500:tls" PeerAddress="172.22.246.74:4500:tls" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752605000.424030" DateTime="2025-07-15T18:43:20Z" Type="CLICommandLog" ID="7316d385f672dd1a" Command="status details" IsError="0" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="30" Time="1752605001.962707" DateTime="2025-07-15T18:43:21Z" Type="ErrorSavingCliHistory" ID="0000000000000000" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" Filename="/root/.fdbcli_history" UnixErrorCode="1e" UnixError="Read-only file system" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" />
<Event Severity="10" Time="1752605001.962707" DateTime="2025-07-15T18:43:21Z" Type="ProcessMetrics" ID="0000000000000000" Elapsed="4.00872" CPUSeconds="0.011377" MainThreadCPUSeconds="0.00858" UptimeSeconds="8.91732" Memory="563298304" ResidentMemory="25825280" UnusedAllocatedMemory="519328" MbpsSent="0.00334471" MbpsReceived="0.0315791" DiskTotalBytes="0" DiskFreeBytes="0" DiskQueueDepth="0" DiskIdleSeconds="0" DiskReads="0" DiskReadSeconds="0" DiskWrites="0" DiskWriteSeconds="0" DiskReadsCount="0" DiskWritesCount="0" DiskWriteSectors="0" DiskReadSectors="0" FileWrites="0" FileReads="0" CacheReadBytes="0" CacheFinds="0" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="0" CacheReads="0" CacheHits="0" CacheMisses="0" CacheEvictions="0" DCID="[not set]" ZoneID="[not set]" MachineID="[not set]" Version="[not set]" AIOSubmitCount="0" AIOCollectCount="0" AIOSubmitLag="0" AIODiskStall="0" CurrentConnections="3" ConnectionsEstablished="0.249456" ConnectionsClosed="0.249456" ConnectionErrors="0" TLSPolicyFailures="0" ThreadID="120965074470983212" Machine="172.22.184.52:189" LogGroup="fdb-cluster" ClientDescription="primary-7.3.63-8947002287012609945" TrackLatestType="Original" />

There seems to be some kind of issue specifically in getting this status because surprisingly everything else is seemingly working.

There is most likely a networking issue that is messing up the TLS stream. A bad MAC should not happen in any circumstances, unless your CPUs have a bug, or your network link is extremely unreliable.

1 Like

My leading theory right now is that there’s something in the environment that elicits a bugged behavior on fdbli.

I’ve inspected the interface and there’s barely any dropped packets and no records are truncated.

We also have all of our other databases, workers and services running in this cluster with no issues (and not to mention the fact that fdb works fine without TLS too).

It’s also very curious that only getting the status fails. Configuring the database, setting and reading keys all work, even if the CLI says the database is unavailable.

From what I can see when running tcpdump for both sides, the client is sending this alert to the server after the handshake completes successfully.

I’m going to try to deploy this into a separate vanilla environment to see if it’s a rare condition that triggers it.

I don’t know the details of your Kubernetes setup but you should check your host network configuration and the configuration of the CNI (the networking component for Kubernetes). Those errors are usually caused by a mismatch in the MTU configuration, so that part of the packet is dropped.

Hi Johannes, thanks for the reply.

Would mind elaborating on what you mean by mismatch for the MUT configuration? Between the CNI and the host?

I don’t manage this cluster, but we run tons of workloads here including several databases (all of them with TLS) and afaik we haven’t seen something like this before.

I had suspected record truncation initially based on that error message but I didn’t see any truncated buffer issues in the interfaces or dropped packets.

I’ve narrowed the error to be caused by the fdbcli by now. The handshake succeeds but after a few encrypted records, the cli sends this alert to the server and restarts the connection.

I was even able to connect reliably to the database by compiling the cli locally for ARM on 7.1.67 and connecting to the kubernetes cluster, so it indeed looks like a problem when running fdbcli from those pods (including the operator, sadly).

The error also appears to be intermittent because around 10% of the client connections we attempt succeed (anecdotally).

Would the status TLS records be particularly large?

Additional note from the team that manages the cluster:

We don’t use overlay networking for our kubernetes setup so there are no weird MTU nesting issues. Each pod is directly attached to the EC2 network via ipvlan and the MTU is 9001 across the board.

An update in case anyone faces a similar problem:

I ended up tracking this problem down to the database trying to write the whole status struct in one go in the response of the status endpoint, which ended up with a call to write_some with more than 20KB.

I tuned the flow knob MAX_PACKET_SEND_BYTES to 8KB and this fixed the problem for me. Since we’ll be using foundationdb as a metadata store for Apache Doris, I’m not too concerned with the potential performance impact but we’ll keep an eye out.

This certainly seems to hint at some problem fragmenting or reconstructing large TLS records, but I even tried upgrading FDB to 7.4.3, linked with OpenSSL 3 and it still made no difference.

I’ll update the thread if we find anything else, but I’m happy to leave it at that for now.

@johscheuer is there any chance we could get a dump of sysctl -a from a production k8s cluster running mTLS on your end? Our infra team would like to see if they can figure out why the problem may present for us but not elsewhere

No worries if not possible though

I feel like this is an MTU issue.

AWS has some specific guidance on MTUs for certain cases:

Does this apply to you?

Hi Semisol,
No, this is all under the same region and VPC sadly and the mtu is 9001 across the board.

Even if the mtu was smaller or different, I’d expect it to be negotiated and then the TLS records appropriately fragmented and re-assembled internally by the OpenSSL layer under these boost streams. For some reason, that’s failing though.

The max TLS record size is 16KB, so I’m guessing these large responses will be fragmented in any network (likely just at the origin in this intra VPC case). The question imo is why openssl is seemingly unable to re-assemble them

This would only happen if they were not in the same L2 network. MTU signaling only happens on L3 when packets go through an intermediate router; otherwise they are dropped or truncated.

Hmm you’re right about the negotiation, but I think dropped packets would just timeout instead of showing this error, no? (I don’t think truncated packets would ever be made available to read under TCP?).

Both machines have the same mtu either way:

cat /sys/class/net/eth0/mtu
9001

I also ran tcpdump on both hosts and the packets are exactly the same on either end.

What I did notice today though, is that the large TLS record we are failing to decrypt shows as a single packet in tcpdump (with length 16458).

This is presumably handled by tso and shouldn’t even be visible to the openssl layer, but it’s interesting:

ethtool -k eth0
Features for eth0:
rx-checksumming: on [fixed]
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on [fixed]
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on [fixed]
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on [fixed]
	tx-tcp-ecn-segmentation: on [fixed]
	tx-tcp-mangleid-segmentation: on [fixed]
	tx-tcp6-segmentation: on [fixed]
generic-segmentation-offload: on
generic-receive-offload: on

I guess at this point I’ll see what openssl is actually seeing in ssl3_get_record

Have you tried disabling it?

Hi folks, sorry for the long delay here. We left this as-is and proceeded to test our deployment of Apache Doris but, unfortunately, the error still persists.

I’ve been doing deeper debugging of the TLS records and I think there’s some truth to the boundaries of the packets having something to do with this.

What I can’t explain is why this only seems to happen in our environment, but I wanted to share the details that show how the last bytes of the

I compiled FDB with a modified openssl-1.1.1m version that adds some logging when the tls13_enc function in openssl-1.1.1m/ssl/record/ssl3_record_tls13.c runs, which logs the first and last bytes of the TLS records as they enter for encryption on the server side and enter for decryption on the client side (happy to share this code if you’re interested).

The logging looks like this for the operation of getting the cluster status (luckily, this always happens in our environment if we don’t cap writes or slow it down in any way, so it’s very consistent):

SERVER:

Record of len 8209 entered tls13_enc for DECRYPTION with sequence number: 0000000000000318. First bytes: ea6cedafcb53c0c443bc9910a570812b8c70f4fabf9e17853b2b8d21208d62a23b731175c91538f91468fc909d61a63e24ed7a97887a4021a881349919638d05. Last bytes: 87bdcdad6c742216d8d3deafa7efb43e1596ebe72c5698cefe5f27cbf601816e13d14c943de65f15e8de2fcf650a1d468358b76fdcde413731cf86e5adbf1937
Large record of len 8209 SUCCEEDED in decryption. ivlen: 12, taglen: 16
Record of len 8553 entered tls13_enc for DECRYPTION with sequence number: 000000000000026b. First bytes: 83a3479ada3e3fb96c733105bde816f30c649ace87f67547d7bfd1c66fcf967cccb01b4c47ef1bca7ddb13b5d7a167d20fb09d729bde7a4182b10ecc72404890. Last bytes: eec3b71c737c9401ed4d80ea432388ab7f71882fe0a45cf3d126c65d4ee93c2ecaa250f596779394b7fb3c3617d6f4ad712c56bf03f1bf0b2bb37333b7d2d6c0
Large record of len 8553 SUCCEEDED in decryption. ivlen: 12, taglen: 16
Record of len 9321 entered tls13_enc for DECRYPTION with sequence number: 00000000000004ea. First bytes: 0b5412a0b9aacef0b7f1009226f2b321c374abb2028845f86c3cc20b84d3b533f7dd9f228e4fb309f976816778992755912813f1a36315291e1ba684b44d8e1e. Last bytes: a406c887dce945934a2563e53714ce9a82fe516dd395f7a74973ec8bcd083716b58e2757f71c9f3fce5581eb8bc77ac90f21a1f306724889bfe27ef525990e88
Large record of len 9321 SUCCEEDED in decryption. ivlen: 12, taglen: 16
Record of len 8209 entered tls13_enc for DECRYPTION with sequence number: 00000000000002b1. First bytes: 5addfe4a3bf101be1fe6ea8b1a29405340fa84c0d4c25a85130aa8320e0b90cee3514186eb4fb063a4e5afc67de4403c666d5649a60d8d2345ec0310d12b0fbc. Last bytes: d87f05ed8c69368e24f7ace7d363560c031c6ecc8e0a6a6da79349ad9125624476d797d0c8e77984260197eff9b599ed4ae077f7cb40ed370e2bb7f6a71cd64b
Large record of len 8209 SUCCEEDED in decryption. ivlen: 12, taglen: 16
Record of len 9321 entered tls13_enc for DECRYPTION with sequence number: 000000000000056b. First bytes: 8191d4d56ba230a32c69b36e6e73e8af8be32d09adcd00d7ebbbd2fb80330dbfd0909cdfef1fbe2d2bf0bbe0b57bce4a640face5cb50630e1a5c30f5f10a1a25. Last bytes: ff287f52b47dfba3e0f9f92392185586dab990844f440f0cd81226c88d161b64bccbb15c8c11c22d993c3129fec28bdaf064c2594d0ecfd58faf4fdfc03ecbb5
Large record of len 9321 SUCCEEDED in decryption. ivlen: 12, taglen: 16
Record of len 8209 entered tls13_enc for DECRYPTION with sequence number: 00000000000004f1. First bytes: 186f81b211ced6d749a0ae948e9c4c975cd9c8ef4c04929517a99c072fb623bcae62d4358fef4daf98b93ec8cd02ea4680c2f9e760da09b82db1adb823882db5. Last bytes: da099605be5e5b9ae72d292f3cec1eee4eeae3ce076bc14ae8745c06e322ddc9e571f6cc2844363037963b557bab95b88fd5548071ea85b897c7ddb12d296227
Large record of len 8209 SUCCEEDED in decryption. ivlen: 12, taglen: 16
Record of len 8577 entered tls13_enc for DECRYPTION with sequence number: 000000000000027b. First bytes: 64034533cba39a0ae7cf45766f8ffaec0f3d0a3e78863bed9fa4c7a5818aa8cea1e6def74f8ef2cd5b9c8cdcfa721ffd5da60e1f4fa21ce46e073237daa0cf83. Last bytes: 74dd7dece9bf7308d189c5565e7676ba75d34d861b616fa39d0fd02461c5729242093d3ee5beb1cf84fd7c7183a263947d4add9c5d48e9fec352fef3f56d09dd
Large record of len 8577 SUCCEEDED in decryption. ivlen: 12, taglen: 16
Record of len 8989 entered tls13_enc for DECRYPTION with sequence number: 0000000000000280. First bytes: f5a9cc9e0252c105064322d6a680e1da357037c7107c261ec0b5877605a0b81ac56685a049ea83104a472215b6f01beddfe0df39170f5a1daac5945a75bef5fb. Last bytes: 0b78d8044ba24ff65566fc9401b215119c39a252b790a07384e7d6507fa9254e20f4bb97832ae761b8e3d302bfd7486eded6631c5d8feb26efcadde34fa840fa
Large record of len 8989 SUCCEEDED in decryption. ivlen: 12, taglen: 16
Record of len 8193 entered tls13_enc for ENCRYPTION with sequence number: 0000000000000003. First bytes: 5881000048d554b81d7461351e000000823e5d8c20000000584a9802000000000600080004000600060004000800090008000400080000000800000002000000. Last bytes: 7265706c696361735f72656d61696e696e67223a327d2c227465616d5f747261636b657273223a5b7b227072696d617279223a747275652c22696e5f666c6917
Record of len 8209 succeeded ENCRYPTION with sequence number: 0000000000000003. First bytes: 1a6703c1440f435f10c9efefd4220bbce8439dcaa268d461bff3a20cf8361f563cbfdda02cb665c30d53e155b723e05264048ba41bc4fb770e1d14b102e8e625. Last bytes: 5cd568fcafc968ede0a0370346096b8b87fcfcb2e665aea35ed9187ec8036914ca768c9651cbbe31cf7b3fd84e7964beb88b565f82aa24e4d9aaeda856e72065
Record of len 16385 entered tls13_enc for ENCRYPTION with sequence number: 0000000000000004. First bytes: 6768745f6279746573223a302c22756e6865616c7468795f73657276657273223a302c227374617465223a7b226865616c746879223a747275652c226e616d65. Last bytes: 5c2243445c22222c2274797065223a2253797374656d4572726f72222c226e616d65223a22696e7465726e616c5f6572726f72222c2264657363726970746917
Record of len 16401 succeeded ENCRYPTION with sequence number: 0000000000000004. First bytes: 31f966b3038ebfd7ed8171118e663ddafb73fd2f4ca55276b6da0c994f5eb0e860672d6a34cd94d3924dc42770fe348941ba0a85a3266b053a1914c02145230b. Last bytes: 3d80b0d2dd526731c17656eb45a9cddcc34faa7d6d5d5157de1de0600feb82ab82251a3cf2d17ff45fbd163a09103f6082466848eb91b33f08fcb3a07f9c2983
Record of len 16385 entered tls13_enc for ENCRYPTION with sequence number: 0000000000000005. First bytes: 6768745f6279746573223a302c22756e6865616c7468795f73657276657273223a302c227374617465223a7b226865616c746879223a747275652c226e616d65. Last bytes: 5c2243445c22222c2274797065223a2253797374656d4572726f72222c226e616d65223a22696e7465726e616c5f6572726f72222c2264657363726970746917
Record of len 16401 succeeded ENCRYPTION with sequence number: 0000000000000005. First bytes: 7c5d2228c4441f23e36f4a9446fb8631786d1987452033419be7db0abb105728b8811389803ee06b3141699c6126530cb628ffd15996683263101704c779a79d. Last bytes: 90cd3a20c84c9a5951934a97e9758ec8c426e4cb3b3044280fa73c68f53f3adaa7623212681fa3b92605c97b6b99f80726c1cd5ab8350d8244a3e3a4d82666a0
Record of len 8541 entered tls13_enc for ENCRYPTION with sequence number: 0000000000000006. First bytes: 6f6e223a2253797374656d4572726f723a20696e7465726e616c5f6572726f7220617420576564205365702031372031313a33303a33322032303235227d5d2c. Last bytes: 3a317d2c226d65737361676573223a5b5d2c22636c75737465725f636f6e74726f6c6c65725f74696d657374616d70223a313735383130383932347d00000017
Record of len 8557 succeeded ENCRYPTION with sequence number: 0000000000000006. First bytes: b524a8794aeba19bb2556d4a41e56991206969042e977b3d863d5c37d516c3af5be025d2b4637d482ea6ded69ddf177350f1899de722fe8a08db4febbc32e5ff. Last bytes: 72637c35e235fd1ee6dce768194f2316f06d712b5c58818fc9e773097146e1a9ff3030cf328f351ab8c209f8922c6e57bc83f9dfe086694454ca628d761f3365


CLIENT:

Record of len 8209 entered tls13_enc for DECRYPTION with sequence number: 0000000000000003. First bytes: 1a6703c1440f435f10c9efefd4220bbce8439dcaa268d461bff3a20cf8361f563cbfdda02cb665c30d53e155b723e05264048ba41bc4fb770e1d14b102e8e625. Last bytes: 5cd568fcafc968ede0a0370346096b8b87fcfcb2e665aea35ed9187ec8036914ca768c9651cbbe31cf7b3fd84e7964beb88b565f82aa24e4d9aaeda856e72065
Large record of len 8209 SUCCEEDED in decryption. ivlen: 12, taglen: 16
Record of len 16401 entered tls13_enc for DECRYPTION with sequence number: 0000000000000004. First bytes: 31f966b3038ebfd7ed8171118e663ddafb73fd2f4ca55276b6da0c994f5eb0e860672d6a34cd94d3924dc42770fe348941ba0a85a3266b053a1914c02145230b. Last bytes: 3d80b0d2dd526731c17656eb45a9cddcc34faa7d|header?: 1703034011|start of record with seq 5: 7c5d2228c4441f23e36f4a9446fb8631786d1987452033419be7db0abb105728b8811389803ee0
Large record of len 16401 FAILED to decrypt. ivlen: 12, taglen: 16

Please notice how the client sees a chunk of the record with sequence number 5 embedded in the last bytes of the record with sequence number 4 (which is a completely invalid tag, thus failing decryption):

Record of len 16401 entered tls13_enc for DECRYPTION with sequence number: 0000000000000004. First bytes: 31f966b3038ebfd7ed8171118e663ddafb73fd2f4ca55276b6da0c994f5eb0e860672d6a34cd94d3924dc42770fe348941ba0a85a3266b053a1914c02145230b. Last bytes: 3d80b0d2dd526731c17656eb45a9cddcc34faa7d|header for rec 5: 1703034011|start of record 5: 7c5d2228c4441f23e36f4a9446fb8631786d1987452033419be7db0abb105728b8811389803ee0

You can see this matches the bytes the server thought it was adding for record 5 at the start (look at the post encryption bytes):

Record of len 16401 succeeded ENCRYPTION with sequence number: 0000000000000005. First bytes: 7c5d2228c4441f23e36f4a9446fb8631786d1987452033419be7db0abb105728b8811389803ee0

What is very suspicious is that this clipping happens right at the boundary of these packets (and they’re pretty small packets even):

    fdb-cluster-cluster-controller-70563.fdb-cluster.doris.svc.iad03-test.k8s.run.ipsec-nat-t > 172.18.105.160.28018: Flags [P.], cksum 0x501f (incorrect -> 0x69c1), seq 24976:29072, ack 2689, win 16, options [nop,nop,TS val 447068986 ecr 679635158], length 4096
	45001034d87a40007f06d350ac167e2f
	ac1269a011946d728b96898c3a476cd6
	80180010501f00000101080a1aa5bb3a
	288268d6ce2e88459d5380289d581c44
	3546c26224cb762adc21b793b27fbc1c
	89387785faf7c9273eba12983a58075e
	8147151822ab73a910163af6cf66c6d5
	873c479b2b4ec6b34a4cdce776f1a049
	8ce650d61d3c7e10fa85deed8ba32f0e
	152163e5f002b49dd6b2f85a2fd06fc3
	ad0da002040110983de458c7f386ef17
	9d53d5f132024dc6d70ca7ad3bee7be8
	e86e64a4fd49c489f67c0cc068c0e3b1
	7089e2fef408198bfddd4aa71504440d
	8d4d2fea0dce31e460f8f7fdcdee9333
	f622a2ddfffe1463f487f7c4ad2231d3
	96f1073b2a625a116646f13bf0aebd23
	c8ade8a0b6dcbf92664cd7ddff98562d
	936f04c4a6c226044cbabab98371f52e
	2e26cfabf92d5937a8b4524012b02323
	a5ea9e0a6ebcd680eab59711d74c693b
	d7930c22eed15bb6493a142b7e433cb3
	20f4adf079121126bb54e8e97ddd0f8f
	cee76e9db27457c87ce73c7c4c7991ea
	9571f320a1c2b4aec09f97b8c7c6b9c3
	2d3af4a35d33dafe193f111ef2fbbfcf
	c2e8cdf6fdfd8df673508283709c729c
	74df0d84062ac72e2e00d9317b6052dd
	f431f318a3f66afb6c5558e89c790233
	34ea1f11ea5eee773c6392a28d05bc41
	ec14759e696f7d814f97a059cd2adc4e
	5db2bfdebf33324c9f7617f4c5b3bde6
	9f8a7af14a2608802168bc0bcc814c66
	8363f9f8e694dc0552f52fc10458a85b
	5ec1defc29ace0d75f9d9ea8f504305d
	50949f8b9d456fbd619a9897d2ec02c1
	31105c8a5d331245955237bdcffa701e
	4f59904a0715e00befd0fcd1284c65c0
	e771efd569f198e4dfad06421b0f1d9a
	658aab359b8e9b3e152b6a121dde0bae
	65d076d90b414e09cd56602c8f2e1a51
	20a1c850fcee907005781e6da0ad1389
	1c3ee3d134864f8ab70eca3599ba9f29
	8bf79d8259e66074710b2393d75d91fc
	5c383390bb8ab59b2c58b1267785cf91
	4a8ea3275e888f44c0fdc533ab24b3d4
	a7b17af96be929c8678f04d0e827ec87
	6fa714cb9b1fec216086fbb96dcbca9c
	7db9045c8a47c7c9889aaaa99737c4c4
	a7be290b31a74ee33e40bcb35e8ed87d
	a09188788b832188256f7f1cb0abaa29
	7d5deb45ddd9506dba5ba7c7cf651111
	3d1d39328a6284cd6b2c9ca94e870222
	03fd78f4007d11f5724bcf4c06a00b43
	f1de3d137bccd11de6d288cb64005ae9
	b9667eb40e0623fb590a2d55de30e619
	e58d07ee11928a145ece3598c4e6b9a6
	7c54e234f5f7e7ed4d9a2fcddddf8b09
	5c2e51a42252f25088a57e1c6124f6ad
	0a15c1926275034f60bfa7f9d0d87283
	6b751bb82aecdf9abec5c99ed33c07dd
	a11753f296540509bc7170ad45be53f0
	d041789d887dff5877a5f0fe5e955ca0
	7c5e05cf9660c209515fd0eb18f12b0b
	c641368620f3fc25abca6f67e2d5d1d7
	bf890e52abeff574daf8a2db9a2cf68b
	7fea8ae92e7d39772b65b3d8513aeca8
	285eb3252668b5a7c8d385dbec93a572
	5bf1c51038d72899602d7d33bd79305b
	5124c8fd342d92e867bad4a1a8357d42
	19f5bc11cc3626f9ad7c382d03b9a212
	2cf8832c00120e7fb5ca8d6b4d57236e
	78028c620c9be2fdcf7974c874706092
	6ddf90a38b8a9ae6b4ca5b59bc765b21
	5a9c587883736b63f79c230a1f1d4200
	a74a74c20e4b09a38f5f324898924ab9
	f3bfb56498db772caabfa71f61d70f6a
	fcf6ec7758c698efe86f57741ee499ef
	f6bdc7b99131af7b79a756bac26e52c7
	ab8b733f6253694a863aa416f4dac237
	f2999d6f26938f9ea832678d3a4d3999
	8240a0c89b278517926c1349c34e80e1
	de313d1fcf2017a0681de34dab8bcf29
	a63226aa68cef5f3d5a16ea8f516ed06
	4b6c939145c442882e7997a91c3860ca
	c9594da392f059a19e99e33e4566b3a1
	85ee803470df026fd8170a9f92aacc75
	a23bd1126facbc1c1e5a5058436cec9d
	de5e22c859a975da1125e41088d5e108
	4e61712f6e7525d13666b39a8624eeac
	be4303259eb6b996801620ad5ee38e21
	39e90e80e62fabd22d017b3ce2dbe950
	a5e9685352187f9b2ae6428df2fa5063
	cf80fe09d0cd438c6ae80a5bd1d8b781
	c4996973907301a1b0855d818080a20c
	d673903e52a2e4533763b04fc4796c3a
	ad7965e3d4ea9e3f10cc217656cf5475
	5586f87e376c128a8b2ed2c470c8df90
	179e5f5eb7093a97337c34b10cb54c88
	bfae151b138c851357c99524cdd34605
	16b22a6ea78431d7c7e0aaef5904ef8b
	fd964a31ef085cdd669d94686a07dbcd
	357d333af128617ef0abcb89028c479f
	36ec01f8ef175728fca81dd45f263b95
	8ceb6cbfe5179cf340b527b08635735d
	69972abd976fa9c5ba9a0f237a99f909
	5d4a7397d9fc9fe6afbfe8de40358ab1
	782ed570dd95be03119e6caeb1a36bfb
	2c7702ed59b21b76110892ec152e7e58
	35f63903c9d4e4110c832238b3987bbd
	1899ce04abf23d127044f4c997202b22
	b4afac29c8ac1aa203f0224f2be4d7eb
	ce8495baf5fef760f651c3ede91f0643
	3d56dee357a6e6df452a30021cc5748c
	52edb5037408b861c7d1130c8d599912
	eb6ca4b39d44a2a4b2df53c185f66865
	e4d8b945f188e90c56b3e3677aa43b01
	71ffefbc7d30808028bb674e7e50d40b
	d3715e4bf8906de1b816f3d11b74251f
	598ced30e488212b5f0b53a22c963394
	728f32ab23e36a35203ff1aa0ac68d58
	0edda5511d5c55e7d26b1f9855e930c0
	f2dc1f26ae4da6f45ad4b77cf0b3f6b4
	75dba5ae76c15463b1cd38c75699a586
	523beabde8daebfdae70e4ccf25d1e3f
	16e3166ead8265cd276c2460f76f0315
	7880f91f19ed1e650b47fdc9dea5cef6
	0ce92b95a96ce46b0338ebfcd8170720
	b0ee62bf9eeca6faffb208d930ec5909
	dbe1d388b203240962b854f9bb4d0e81
	352c3f617e63659c95d2687b974b3e96
	1c0cf50c8801ad0cd9129f60fd8fd0ff
	dbaee1230137631b692e719511adccf7
	08d36b4832a8f60e002555c0140738ca
	f6b422c0483494967db3fe50a244b1bf
	295c7262ff7d7bf0019e96a9d6af06cc
	903e27c70aa7428574b9219cffe32579
	0f07a4ef56b465160977052c867461f3
	24f841897e02945d1244546ff8d3c616
	a72104e91a8c86ec1a0d7012a910ef50
	daafc318ea08e2b393559ef2c19de2d8
	258b71634a46fd5abd3ec14a8c1e6596
	43a133a7d2ed157d32a9e6f57b00e4dc
	592bade88a4b2644a1d682acf7e086d3
	963e6a0fd19a77aa7c2e06cd76597ff4
	ffb67c1a8a8abd60381b3634c0d422ba
	09f9ee3025a8df4f82641649aad42bd5
	18f640aab26716888ebfe182b1564757
	0b615f529c459467065669d1375b2641
	b97fba3f2339644024b577d58b20202b
	8c4e708ad0e8a7064ae2ed81f1c7dc88
	02cb3d67a9b27de6e57fa876699bca73
	d802e153016b4fd49620bf3502591573
	66dc796b3e54cfc208b052666fb04636
	48df6313cae535fa2994e98ca4bd94c3
	d146150d43534f8b29aba831a6da9e58
	035013bf66dce0cf267d6a895b3cfd67
	676de9d629af07baf3afed73fad19343
	288716f99f29b6f092ae2131fb2a637b
	47c078267ea83eca65041dd40dc780ad
	807296df26ecfd63908565161ff5336f
	84611569d4914397adde591c3519af4a
	73e6a21d730d110a84a4ba0414fba895
	273a48bb30718b53f12230942f800da0
	16ca73a1334336fdcdcc3f17dfca8a95
	a0d314a6e633e410eddb26c8e2e80d3c
	d679c8474481d01cc29c8689e7bd2a4a
	2ba83e31d7f2c76b42d9b5264a3c0466
	5f78b32e2b7b1e8b5ebb17a2ed3c9760
	2a2c3ae39f5af9d6a6bfd05be5959c27
	cb6204cfc182c1a7eb0cd122b83ab76c
	79a156b035343af1b13ac64eb17eda71
	b1736da4c8bab70d9457798df3df3774
	e7787625268f2d166df733196ee9aa7f
	d10468ec7b42e5e244261e1918b044d2
	7099a6e49fefee0a6d6df2ae0808de3a
	2087f0a8903ad4e1c0377102fc7b26ba
	933aefed0d7a31063f1e647c14900c41
	6cb42237e109d3d785e26e350583fa95
	94a28fcd764696520ece0f8d59c91228
	8cb3bb8c538619dcecf7c459535a3a0a
	37c15e98542c856c43eeaa32857c5b47
	b17586a3c245b954728b7c6625617a82
	50f95811bdd72a89958a8646092de56a
	5c3b3bff791e7e30916610a71ef6905a
	fbb3c79a417cb4cb0bd959f7e0cc0d75
	ae63a27931dbdfcd6d7a8d9f649edb5b
	4159daba83da5e3a66c43222d3f8c393
	bcee27f02a6d184e69a112b28a29a54a
	d6c68a89da4db34c88004ddc73a58eba
	26d6c2343b7f88cdfd0b315d13d3e52f
	20242f436a2983041e5308ae69dd00bf
	9872efce6defd3e8f9cdb5c12a3756c1
	f78c98b206b054f098546c58a784355e
	30f9ad12be3720f71467cb18ad13ef32
	c6e6383620d04df1a47080a0fdcf6607
	de7f0af1d036bdae8e5ec06a43ca4fbf
	dc442b287abfb4a509ed17e009121bbc
	2eeb1bfffd8fc8e42d9353e444177d97
	ab1f6f02f8e999d1373e4e56a7054a23
	d1be5a9c7c9c32263c26dc7c780c2e6a
	cd027262ddff40bdc47c2825bdc93405
	1777960c5107e0806af839b644ea33b8
	81b5e035aafeb3cc27be2e97e267395f
	db26367bc29c00211ef41d0b1ff46b2d
	e4a91155f614bb91b0f12bf7963fcb45
	2bd6bf01f9663147198ff4787e6f1e0e
	15b96fe600ef85823799166cc29e532b
	acb2c3de768d53bc4a8733f836292d20
	444c430f40018668f18628f4ca86aaed
	b1588f77372843f4022203a9fe37b534
	cebfe20578bd14fa5e0dca432b8f853e
	907ea150811067379dc0322e5ce8dac3
	3e3123f37eb34afc71b5015221491fcb
	3f246d7fb94c0b7d5107c0d739e4a32e
	32779fb9e941b33aca2bf5806c25d691
	338673b045e04f2a08e51366fbcd493a
	db93a3a2ae60c1b635b42c8a78845142
	f2481326ee4b63bbbb4534176c97fa97
	3e6b3d16ea53c964ad2b732960cf654a
	bb5512082f0603d7b46c4f668077e7f8
	cd95a70c05529e1425ea7f9807a5dc7f
	2427060bb53e7bf82292579bf55126ac
	f9eceb053d1694fbfb0e5b72503594e6
	6d18bd4e93a2c889fda9b336a80b39df
	0624f3a49f0a059f81fcfa71fe6f81ca
	660ded2b65c0c8d6c1c82ea5bb0312d0
	604a91cf699d80470b239792e7b481ab
	b81100f121bbe0d89e3641f2d6b6cf7b
	3a992cccd0ca96b963189d6258e3db8b
	5e47e34f6aaa04e433a153ec061e7ec3
	dec1421a69ea9cf7d6ff48d0895085f8
	e02c78c6faaaf6183b46653be2619c65
	aad1bb3832d48b0426e045ac1906fdfd
	3eb6b644af0ee92b4eeaa4c2630d3b0c
	9ca2b01aa1bb45ab04fd5b4d6b58d1e9
	40b53115cd267e38a0ca0baa9cd1474b
	c92a4f09dc367223d1de1c5feae09e4a
	87c59f36932b85313f18c8a4e6f940a0
	89d6c94195876c25a589e2d928b3ff42
	6fb760f9d14d3fcb9a59bdb518bd5c8e
	6eec68ae890f6f2fb7a4bcffb1a623ad
	aec4518890046f434efa08f5b8e70cc5
	167f47f1e8d4e4915247324a18a86504
	1054aff01f89f033db5cc4d9be2e49fc
	71c74629d0a4bde13b4640bb1953f265
	9a36dfa0ce28d1d5ffb4c5a4dc19f11c
	d20ba98b44d94afc27124aa5eea01fe1
	24fbdaf3e47374f45d12951be410b9ac
	b2e4f3c77de7949294fb1dc15d392b1f
	f5655a840ca243316ee7e74994022cb5
	9fd50d865c3946fe7d21cc868dda74fb
	685fd3a60b7c2da0b7a957937dbd2f54
	89becc3c6dcc63d38821d1b999df9e1e
	bc13cf75016b2cf41ded96dc9aed3e7d
	b881d7c29369823f53c9e11080302fbb
	e881cd3b30e1014e10d65b280c078089
	cc72c57b082f87df113da2e72134086d
>>>	<LAST BYTES OF ORIGINAL RECORD>3d80b0d2dd526731c17656eb45a9cddc
	c34faa7d
11:35:24.988840 IP (tos 0x0, ttl 127, id 55419, offset 0, flags [DF], proto TCP (6), length 16458)
    fdb-cluster-cluster-controller-70563.fdb-cluster.doris.svc.iad03-test.k8s.run.ipsec-nat-t > 172.18.105.160.28018: Flags [P.], cksum 0x8035 (incorrect -> 0x8b40), seq 29072:45478, ack 2689, win 16, options [nop,nop,TS val 447068986 ecr 679635158], length 16406
	4500404ad87b40007f06a339ac167e2f
	ac1269a011946d728b96998c3a476cd6
	80180010803500000101080a1aa5bb3a
	288268d6<START OF NEXT RECORD HERE>17030340117c5d2228c4441f
	23e36f4a9446fb8631786d1987452033
	419be7db0abb105728b8811389803ee0
	6b3141699c6126530cb628ffd1599668
	3263101704c779a79d55f9c6be516abf
	383c3a3bd835440b676a159b3dc9a844
	0dcc5ccc813984fb03883d70645d4908

I don’t see the original last bytes of the record anywhere in the tcp stream: 6d5d5157de1de0600feb82ab82251a3cf2d17ff45fbd163a09103f6082466848eb91b33f08fcb3a07f9c2983

At this stage we’ll look for some workarounds, but I wanted to share this in case it rings a bell.