Fdbmonitor configuration reload not stopping old fdbserver processes

davisp · October 9, 2019, 6:30pm

I’m not 100% sure if this is a bug or not but figured I’d leave a note. I’ve got a test cluster that was accidentally reconfigured to have a single fdbserver per node due to some chef mishaps. While debugging what had happened I discovered one of the nodes had the entire set of proceses in the pre-single-fdbserver config still in the cluster.

Out of curiosity I touch'ed /etc/foundationdb/foundationdb.conf to see if maybe Chef had done something odd that prevented fdbmonitor from noticing the new config file. However, after touching foundationdb.conf, fdbmonitor logged that it was deconfiguring each of the expected processes to match the new config, yet none of the excess fdbserver processes have been stopped and are still part of the cluster.

Also to be clear, we’re being particularly brutal when we apply new test configurations. These are generally accompanied by rm -rf $data_dir type operations as we reset a test cluster for benchmarking. It just surprised me enough to make a note.

davisp · October 9, 2019, 7:15pm

This turns out to be more of a confusing documentation issue. Our configs set kill_on_configuration_change = false which apparently means “don’t kill fdbserver processes”. Setting it to true causes fdbmonitor to actually kill child processes.

The current documentation reads:

If kill_on_configuration_change parameter is unset or set to true in foundationdb.conf then fdbmonitor will restart on changes automatically. If this parameter is set to false it will not restart on changes.

Which I took to mean, fdbmonitor itself would restart and rely on a daemon supervision system to restart the whole process tree. Instead it appears to actually mean:

If kill_on_configuration_change parameter is unset or set to true in foundationdb.conf then fdbmonitor will restart fdbserver processes on configuration changes automatically. If this parameter is set to false it will not restart any fdbserver processes on configuration changes.

ajbeamon · October 9, 2019, 8:28pm

Ah yes, your observation is correct (although the managed processes need not be fdbserver specifically), and I can see why the current documentation is confusing. If you’re interested in filing a PR clarifying the docs, that would be great. Or if you’d prefer, I can update it.

davisp · October 16, 2019, 5:46pm

I’ll file a PR to learn how the docs work.

Topic		Replies	Views
Why fdb monitor is needed in foundationdb and what it's purpose? Using FoundationDB	7	1186	May 24, 2019
Processes OOM, fdbmonitor doesn't restart Using FoundationDB	4	2778	January 2, 2020
Fdbmonitor starting identical copies of itself instead of fdbserver process (5.2.5, RHEL) Using FoundationDB	3	839	July 5, 2018
Setting CPU affinity for fdbservers Using FoundationDB	4	1310	November 18, 2018
Did something change in fdbmonitor in 5.2.5? Using FoundationDB	2	953	July 4, 2018

Fdbmonitor configuration reload not stopping old fdbserver processes

Related topics