so we created a 3 node cluster of fdb coordinators using
single redundancy mode:
fdb> coordinators 172.17.0.2:4500 172.17.0.3:4500 172.17.0.4:4500 Coordination state changed fdb> fdb> fdb> status details Using cluster file `/etc/foundationdb/fdb.cluster'. Configuration: Redundancy mode - single Storage engine - memory-2 Coordinators - 3 Usable Regions - 1 Cluster: FoundationDB processes - 3 Zones - 3 Machines - 3 Memory availability - 8.0 GB per process on machine with least available Fault Tolerance - 0 machines Server time - 09/20/23 22:47:52 Data: Replication health - Healthy Moving data - 0.000 GB Sum of key-value sizes - 0 MB Disk space used - 315 MB Operating space: Storage server - 1.0 GB free on most full server Log server - 40.1 GB free on most full server Workload: Read rate - 19 Hz Write rate - 3 Hz Transactions started - 7 Hz Transactions committed - 1 Hz Conflict rate - 0 Hz Backup and DR: Running backups - 0 Running DRs - 0 Process performance details: 172.17.0.2:4500 ( 1% cpu; 2% machine; 0.000 Gbps; 0% disk IO; 0.5 GB / 8.0 GB RAM ) 172.17.0.3:4500 ( 2% cpu; 2% machine; 0.000 Gbps; 0% disk IO; 0.5 GB / 8.0 GB RAM ) 172.17.0.4:4500 ( 1% cpu; 2% machine; 0.000 Gbps; 0% disk IO; 0.5 GB / 8.0 GB RAM ) Coordination servers: 172.17.0.2:4500 (reachable) 172.17.0.3:4500 (reachable) 172.17.0.4:4500 (reachable) Client time: 09/20/23 22:47:52 fdb>
Now when we kill just 1 of the 3 nodes, the cluster becomes completely unavailable.
vagrant@vagrant-1:~$ fdbcli Using cluster file `/etc/foundationdb/fdb.cluster'. The database is unavailable; type `status' for more information. Welcome to the fdbcli. For help, type `help'. fdb> fdb> status Using cluster file `/etc/foundationdb/fdb.cluster'. Could not communicate with all of the coordination servers. The database will remain operational as long as we can connect to a quorum of servers, however the fault tolerance of the system is reduced as long as the servers remain disconnected. 172.17.0.2:4500 (reachable) 172.17.0.3:4500 (reachable) 172.17.0.4:4500 (unreachable) Locking coordination state. Verify that a majority of coordination server processes are active.
Why is this? Its single redundancy mode after all. Why is it unavailable when 2 of the 3 nodes are still up?