When one machine becomes inaccessible from the cluster, what impact does cluster have?


I have FDB Cluster with total 6 machines with three_data_hall. In each region, there are 2 machines.
I am aware that FDB cluster can work even if one machine is not okay. indeed it does.

But I found that fdb cluster became unhealthy with following message after sudden reboot of a machine.

$ fdb --exec "status json"
      "database_status" : {
            "available" : false,
            "healthy" : false
        "messages" : [
                "description" : "Unable to locate the data distributor worker.",
                "name" : "unreachable_dataDistributor_worker"
                "description" : "Unable to read database configuration.",
                "name" : "unreadable_configuration"

Later I came to know that firewalld of the machine blocked all incoming packet. outgoing packet seems okay. So I stop the fdb processes on the machine, then fdb cluster status became okay.

So it seems that shutting down machine/stopping the fdb process looks okay, but being only half connected(incoming packet blocked/outgoing packet passed) has an impact on the cluster health.

When I stop the fdb process on the host, only 5 machines are visible from the result of fdbcli --exec status json
But when incoming packet was blocked by firewalld, I could see 6 machines from the result of fdbcli --exec status json. Even some roles were assigned on the unreachable machine. and I found that “No route to host” error occurred from other 5 machines.

So I wonder

  • Is there anything else I can do, besides stopping the FDB on the unreachable host?
  • How does the fdb cluster determine a machine is alive then assign a role? why, even “no route to host” error occurs, does the cluster not detach the inaccessible machine?

Thank you.