When fdbmonitor is handling signals it seems to be sending a HUP to it’s parent process. This is unexpected (to me) and a problem for the infrastructure i use to manage long running processes.
p.s. it’s a little unclear where to best tag bug reports so just let me know if this is better filed on github
Steps to reproduce:
- run
fdbmonitor
in a bash script that logs HUP signals - send a TERM signal to
fdbmonitor
- expect no signal to the parent process, but see a
HUP
logged
$ uname -a
Linux hostname 4.15.3-1.el7.elrepo.x86_64 #1 SMP Mon Feb 12 06:46:25 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
catch_signals.sh
#!/bin/bash
echo "$0 PID $$"
cleanup_exit() {
echo "exiting"
}
cleanup_hup() {
echo "got HUP"
}
trap cleanup_exit EXIT
trap cleanup_hup HUP
echo "running $@"
$@
echo "got exit code $?"
logs
~/catch_signals.sh /usr/lib/foundationdb/fdbmonitor --conffile /data/fdbdata/conf/foundationdb.conf
/root/catch_signals.sh PID 23509
running /usr/lib/foundationdb/fdbmonitor --conffile /data/fdbdata/conf/foundationdb.conf
Time="1529940896.120814" Severity="10" LogGroup="default" Process="fdbmonitor": Started FoundationDB Process Monitor 5.1 (v5.1.5)
Time="1529940896.121028" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf file /data/fdbdata/conf/foundationdb.conf
Time="1529940896.121039" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf dir /data/fdbdata/conf (2)
Time="1529940896.121051" Severity="10" LogGroup="default" Process="fdbmonitor": Loading configuration /data/fdbdata/conf/foundationdb.conf
Time="1529940896.121435" Severity="10" LogGroup="default" Process="fdbmonitor": Starting backup_agent.1
Time="1529940896.121587" Severity="10" LogGroup="default" Process="fdbmonitor": Starting fdbserver.4700
Time="1529940896.122254" Severity="10" LogGroup="default" Process="fdbserver.4700": Launching /usr/sbin/fdbserver (23512) for fdbserver.4700
Time="1529940896.122272" Severity="10" LogGroup="default" Process="backup_agent.1": Launching /usr/lib/foundationdb/backup_agent/backup_agent (23511) for backup_agent.1
Time="1529940896.159089" Severity="10" LogGroup="default" Process="fdbserver.4700": FDBD joined cluster.
Time="1529940913.173416" Severity="20" LogGroup="default" Process="fdbmonitor": Received signal 15 (Terminated), shutting down
got HUP
got exit code 0
exiting
In a separate session show the process tree and send a TERM to fdbmonitor
$ pstree -ap 23509
catch_signals.s,23509 /root/catch_signals.sh /usr/lib/foundationdb/fdbmonitor --conffile /data/fdbdata/conf/foundationdb.conf
└─fdbmonitor,23510 --conffile /data/fdbdata/conf/foundationdb.conf
├─backup_agent,23511 --cluster_file /data/fdbdata/conf/fdb.cluster --logdir /log/fdb
│ ├─{backup_agent},23517
│ └─{backup_agent},23518
└─fdbserver,23512 --cluster_file /data/fdbdata/conf/fdb.cluster --datadir /data/fdb/4700 --listen_address 0.0.0.0:4700 --logdir /log/fdb --public_address ...
├─{fdbserver},23513
├─{fdbserver},23514
├─{fdbserver},23515
└─{fdbserver},23516
$ kill 23510