Large amount of log messages with SlowSSLoopx100

mesto · June 23, 2025, 10:40am

Hi everyone (my first post here).

We have started using FoundationDB with a Kubernetes operator installation in AWS EKS. After adding a sidecar container to ship the trace logs with Severity > 20 in a central location, we noticed that all storage servers are constantly logging SlowSSLoopx100 messages (every about 15s).

Thinking we must have made some configuration mistake (the production cluster we are testing has some different specs than below), i went and did a very basic install of

the default k8s operator provided cluster here: fdb-kubernetes-operator/config/tests/base/cluster.yaml at 585b2e0037fb20d61103c7cd1f894e6449aa4367 · FoundationDB/fdb-kubernetes-operator · GitHub
2 nodes c7i.xlarge (4 cpus and 16G of memory each)
the storage class is gp3

… and the results were similar, a lot of SlowSSLoopx100

trace.10.100.29.70.4501.1750674795.uJrSIG.0.1.xml:<Event Severity="20" Time="1750675166.314823" DateTime="2025-06-23T10:39:26Z" Type="SlowSSLoopx100" ID="c6437fbcc6166a71" Elapsed="0.107017" ThreadID="1680275776998010791" Machine="10.100.29.70:4501" LogGroup="test-cluster" Roles="RV,SS" />

My main question is, is anyone else seeing the same messages in their clusters ?
Has anyone troubleshooted something like this before ? It is especially puzzling since I have been looking into cpu/disk and nothing stands out, everything is idle as it can be. It makes no sense for this event loop to take over 50ms (as per the code foundationdb/fdbserver/storageserver.actor.cpp at 2b9b0f778a6ccdcb82bee1b945cbd99e17e1c1b3 · apple/foundationdb · GitHub)

Thanks everyone

johscheuer · June 25, 2025, 9:10am

The default provided cluster is not meant for performance testing and is only meant to provide an example configuration. Per default the operator will set the same limits as requests are set: fdb-kubernetes-operator/docs/manual/warnings.md at main · FoundationDB/fdb-kubernetes-operator · GitHub in the case of the example cluster the following resources are specified:

              resources:
                requests:
                  cpu: 100m
                  memory: 128Mi

So you probably are hitting CPU throttling.

mesto · June 25, 2025, 6:24pm

Thank you for the link. I went ahead and disabled cpu quotas in k8s and increased requests to 1 CPU and still see these messages constantly.

After a few more tests, it looks like this may be a false positive message, due to the DB being mostly idle (maybe, hopefully here
). I configured the knob knob_no_recent_updates_duration down to ~15ms and almost all of the warnings related to SlowSSLoopx100 disappeared, the logs are clear now. Is this a wrong assumption ?

Thank you!

Topic		Replies	Views
Debugging high latency in toy cluster Running FoundationDB performance	3	277	May 28, 2024
Storage Server CPU bottleneck - Growing data lag Using FoundationDB performance	22	3034	December 13, 2021
FoundationdDB Cluster Performance Issues Using FoundationDB performance , operator	11	1305	October 21, 2020
Foundationdb 6.2 - fdbserver going out of memory Using FoundationDB	9	1027	April 23, 2020
Understanding about testing class and performance tuning in Foundationdb Kubernetes Operator performance	2	595	February 15, 2023

Large amount of log messages with SlowSSLoopx100

Related topics