Hi folks!
I’m running fdb cluster on 30 servers, 96 cpu/workers each. I noticed that quite a lot of time coordinator worker is pegged at 100% cpu and after profiling I see the ClusterGetStatusActor is consuming the majority of CPU cycles
Multiple concurrent calls to latestEventOnWorkers() for different event types:
MachineMetrics
ProcessMetrics
NetworkMetrics
TraceFileOpenError
ProgramStart
etc..
I noticed that it can be tuned with server knobs like status_min_time_between_requests, but there are no recommendations (Or I might be reading documentation wrong)
So the question is - what would be the recommendation?