Coordinator performance

evkuzin · August 22, 2025, 3:40pm

Hi folks!
I’m running fdb cluster on 30 servers, 96 cpu/workers each. I noticed that quite a lot of time coordinator worker is pegged at 100% cpu and after profiling I see the ClusterGetStatusActor is consuming the majority of CPU cycles

Multiple concurrent calls to latestEventOnWorkers() for different event types:

MachineMetrics

ProcessMetrics

NetworkMetrics

TraceFileOpenError

ProgramStart

etc..

I noticed that it can be tuned with server knobs like status_min_time_between_requests, but there are no recommendations (Or I might be reading documentation wrong)

So the question is - what would be the recommendation?

gxglass · August 24, 2025, 1:18am

No great recommendation (from me, anyway). AI found this, where we document O(500) cores/processes: Known Limitations — FoundationDB ON documentation

That said if you can provide a more detailed profile maybe there is something we can do here.

Topic		Replies	Views
100% CPU usage on an idle cluster Using FoundationDB	9	527	May 3, 2023
Troubles scaling up the cluster Using FoundationDB	31	4095	November 1, 2018
FDB 6.2 - proxies processes have 100% CPU usage Running FoundationDB performance	1	528	October 10, 2022
CPU limited storage processes Using FoundationDB performance	9	1807	May 18, 2021
Cluster tuning cookbook Using FoundationDB	26	9332	February 1, 2019

Coordinator performance

Related topics