Hi everyone (my first post here).
We have started using FoundationDB with a Kubernetes operator installation in AWS EKS. After adding a sidecar container to ship the trace logs with Severity > 20 in a central location, we noticed that all storage servers are constantly logging SlowSSLoopx100
messages (every about 15s).
Thinking we must have made some configuration mistake (the production cluster we are testing has some different specs than below), i went and did a very basic install of
- the default k8s operator provided cluster here: fdb-kubernetes-operator/config/tests/base/cluster.yaml at 585b2e0037fb20d61103c7cd1f894e6449aa4367 · FoundationDB/fdb-kubernetes-operator · GitHub
- 2 nodes c7i.xlarge (4 cpus and 16G of memory each)
- the storage class is gp3
… and the results were similar, a lot of SlowSSLoopx100
trace.10.100.29.70.4501.1750674795.uJrSIG.0.1.xml:<Event Severity="20" Time="1750675166.314823" DateTime="2025-06-23T10:39:26Z" Type="SlowSSLoopx100" ID="c6437fbcc6166a71" Elapsed="0.107017" ThreadID="1680275776998010791" Machine="10.100.29.70:4501" LogGroup="test-cluster" Roles="RV,SS" />
My main question is, is anyone else seeing the same messages in their clusters ?
Has anyone troubleshooted something like this before ? It is especially puzzling since I have been looking into cpu/disk and nothing stands out, everything is idle as it can be. It makes no sense for this event loop to take over 50ms (as per the code foundationdb/fdbserver/storageserver.actor.cpp at 2b9b0f778a6ccdcb82bee1b945cbd99e17e1c1b3 · apple/foundationdb · GitHub)
Thanks everyone