I’m relaying an installation issue from someone onsite, so may not have all details available. note: also issue has been worked around, so can’t get more details unfortunately
After installing 5.2.5 in production on 3 hosts running RHEL 7.4, host #1 and #3 work perfectly, but all the processes on host #2 stopped working: they were not able to create new log files, were unable to create or update the data files, and also were showing the
incorrect_cluster_file_contents error. Running fdbcli on that host worked fine (as it could see all the other nodes of the cluster). Multiple reboot did not change the behavior. Each host has 4 process (+ 1 backup agent).
edit: the admin uninstalled foundationdb, and reinstalled from scratch (with the proper rights on the mounted partitions) and now everything works as intended.
I asked the person onsite to list the processes and something weird is happening: the foundationdb.conf is set up to spin 4 processes + 1 backup agent, but we see that fdbmonitor is spinning 5 additional instances of itself (with same arguments) as root, instead of spinning the fdbserver / agent processes.
Sorry for the quality picture, it’s all I’ve got at the moment:
Here is by comparison what it looks like on all my other linux VMs here (and also the case for the other 2 hosts installed on location):
All I have for additional information is that the data folder is mounted on a different partition, and the admin forgot to chown the folder with the correct rights (for user foundationdb).
Any idea what could lead fdbmonitor to spinning the correct number of identical copies of itself, instead of fdbserver instances?