Storage server spending large amount of CPU time in the network stack

In production in one of our cluster in 7.3.43 (our clusters handle different kind of workload so 1 is not comparable to another) we are seeing a lot of CPU (90% +) used on some storage server. I did some flamegraph profiling (__run_timers.part.0 (250,000 samples, 0.01%)) and it seems that we spend quite a lot of time in CPU sending and receiving network data for instance _libc_recv is taking close to 10% CPU and _sys_sendmsg takes 25%.

I’m wondering if there is tuning that is needed to reduce the share of CPU done doing network operations I haven’t found really guidelines.

Don’t know if there are tunings we can do. It might be worth looking more into the number of requests and bandwidth used by the storage server. StorageMetrics event has information about the number of requests. If the storage server is having more traffic than others, maybe the problem is related to “hot” shard (This tool foundationdb/contrib/transaction_profiling_analyzer/transaction_profiling_analyzer.py at main · apple/foundationdb · GitHub can help debug hot shard)?