Pods trying to connect to IPs not anymore assigned to FDB pods

I’m starting to witness a weird behavior while investigating other issues:
Lots of pods are trying to connect to other IPs that are not anymore assigned to current pods for FDB.
For instance on the pod currently holding the CC role:

{  "Severity": "10", "Time": "1745441774.112936", "DateTime": "2025-04-23T20:56:14Z", "Type": "ConnectingTo", "ID": "0000000000000000", "SuppressedEventCount": "14", "PeerAddr": "10.193.1.48:4500:tls(fromHostname)", "PeerAddress": "10.193.1.48:4500:              tls(fromHostname)", "PeerReferences": "14", "FailureStatus": "FAILED", "ThreadID": "xxx", "Machine": "10.193.0.244:4500", "LogGroup": "fdb-cluster", "Roles": "CC" }
...
{  "Severity": "10", "Time": "1745449091.502429", "DateTime": "2025-04-23T22:58:11Z", "Type": "ConnectionClosed", "ID": "0000000000000000", "Error": "connection_failed", "ErrorDescription": "Network connection failed", "ErrorCode": "1026",                        "SuppressedEventCount": "7", "PeerAddr": "10.193.1.48:4500:tls(fromHostname)", "PeerAddress": "10.193.1.48:4500:tls(fromHostname)", "ThreadID": "xxx", "Machine": "10.193.0.244:4500", "LogGroup": "fdb-cluster", "Roles": "CC" }

The IP 10.193.1.48 is currently allocated in my k8 cluster but to a non FDB pod and you can see the issue has been there at least for 2 hours I suspect that at some point a pod had this IP but it’s gone for a while. Why are process still trying to connect to IPs that are (long) gone ?

That’s a bug in the networking layer in FDB. Every fdbserver process keeps an in-memory map of connected peers and in theory old peers should be removed when they are disappearing. We are already investigating this issue but it’s a bit more complex to solve. The same issue can exit on the client side. Restarting the fdbserver processes will clean up the in-memory map if those additional connections should cause issues.

1 Like