In production in one of our cluster in 7.3.43 (our clusters handle different kind of workload so 1 is not comparable to another) we are seeing a lot of CPU (90% +) used on some storage server. I did some flamegraph profiling (__run_timers.part.0 (250,000 samples, 0.01%)) and it seems that we spend quite a lot of time in CPU sending and receiving network data for instance _libc_recv
is taking close to 10% CPU and _sys_sendmsg
takes 25%.
I’m wondering if there is tuning that is needed to reduce the share of CPU done doing network operations I haven’t found really guidelines.