Setting CPU affinity for fdbservers

cespare · April 19, 2018, 10:52pm

I’m reading this documentation and it says

The core FoundationDB server process is fdbserver. Each fdbserver process uses up to one full CPU core, so a production FoundationDB cluster will usually run N such processes on an N-core system.

To make configuring, starting, stopping, and restarting fdbserver processes easy, FoundationDB also comes with a singleton daemon process, fdbmonitor, which is started automatically on boot. […]

Does fdbmonitor pin fdbserver processes to CPUs? From poking around the code, it seems like the answer is no.

I would think that this is important for best performance. At our company we run a large cluster of redis servers, which are also single-threaded, and in order to maximize throughput and minimize latency we have to pin each redis-server process to a dedicated CPU and also pin NIC rx queue IRQs to different CPUs.

(And on large machines with multiple NUMA nodes it seems even more important to ensure that at least each server process is pinned to a single node so that it cannot allocate memory on the wrong domain.)

Does this concern make sense, or have I misunderstood something? Are these features which fdbmonitor could add one day, or is the idea that in certain cases FDB users ought not to use fdbmonitor and should run the fdbservers according to their particular needs?

alexmiller · April 19, 2018, 11:19pm

No, I agree that the code appears it does not. It looks like once upon a time the resolver tried to be CPU/NUMA-aware to some degree, but it was commented out. It’s worth considering doing, and I’ve filed issue 196 to track it. Though isolating low latency processes and network interrupts is a good idea, I’m not clear that having fdbmonitor change to which core NIC IRQs are sent wouldn’t be seen as surprising behavior.

Thanks!

cespare · April 19, 2018, 11:36pm

I’m not clear that having fdbmonitor change to which core NIC IRQs are sent wouldn’t be seen as surprising behavior.

Oh yeah, I definitely agree that fdbmonitor shouldn’t be doing that. (And on a lot of Linux systems it would just be fighting with irqbalance anyway.) I just brought it up as an example of the types of stuff that might all have to be coordinated together.

Therefore it seems like having a way to tell fdbmonitor which CPUs it should be using would be best. Then I can pin NIC IRQs elsewhere and also possibly isolcpus all of these cores.

Ramaprakash · April 20, 2018, 4:40pm

We should also increase the priority, that way non priority process does not kick out the fob server process.

panghy · November 18, 2018, 1:24am

Just commented on the issue, we do see significant improvements to hot spots in the cluster when processes are pegged with numactl.

Topic		Replies	Views
Question about process class FoundationDB Core	3	1137	May 17, 2018
Why fdb monitor is needed in foundationdb and what it's purpose? Using FoundationDB	7	1188	May 24, 2019
Questions about process classes, recruitment behavior and cluster configuration Using FoundationDB	5	2802	June 15, 2019
Fdbmonitor configuration reload not stopping old fdbserver processes Using FoundationDB	3	564	October 16, 2019
Designing a new FoundationDB server Using FoundationDB	3	346	December 14, 2023

Setting CPU affinity for fdbservers

Related topics