We recently encountered an unexpected cluster state change where 5 out of 7 servers were marked as excluded.
This was not an intentional maintenance event, and to our knowledge no exclude command (manual, scripted, or automated) was issued against the cluster. When included, they easily rejoined the cluster.
At the time of discovery:
-
The cluster was still available
-
Replication health was reported as healthy
-
FoundationDB is running slowly
We are trying to understand how this exclusion could have occurred and what mechanisms might trigger it implicitly. Would the continuous backup have anything to do with this?