so we created a 3 node cluster of fdb coordinators using single
redundancy mode:
fdb> coordinators 172.17.0.2:4500 172.17.0.3:4500 172.17.0.4:4500
Coordination state changed
fdb>
fdb>
fdb> status details
Using cluster file `/etc/foundationdb/fdb.cluster'.
Configuration:
Redundancy mode - single
Storage engine - memory-2
Coordinators - 3
Usable Regions - 1
Cluster:
FoundationDB processes - 3
Zones - 3
Machines - 3
Memory availability - 8.0 GB per process on machine with least available
Fault Tolerance - 0 machines
Server time - 09/20/23 22:47:52
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 0 MB
Disk space used - 315 MB
Operating space:
Storage server - 1.0 GB free on most full server
Log server - 40.1 GB free on most full server
Workload:
Read rate - 19 Hz
Write rate - 3 Hz
Transactions started - 7 Hz
Transactions committed - 1 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Process performance details:
172.17.0.2:4500 ( 1% cpu; 2% machine; 0.000 Gbps; 0% disk IO; 0.5 GB / 8.0 GB RAM )
172.17.0.3:4500 ( 2% cpu; 2% machine; 0.000 Gbps; 0% disk IO; 0.5 GB / 8.0 GB RAM )
172.17.0.4:4500 ( 1% cpu; 2% machine; 0.000 Gbps; 0% disk IO; 0.5 GB / 8.0 GB RAM )
Coordination servers:
172.17.0.2:4500 (reachable)
172.17.0.3:4500 (reachable)
172.17.0.4:4500 (reachable)
Client time: 09/20/23 22:47:52
fdb>
Now when we kill just 1 of the 3 nodes, the cluster becomes completely unavailable.
vagrant@vagrant-1:~$ fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
fdb>
fdb> status
Using cluster file `/etc/foundationdb/fdb.cluster'.
Could not communicate with all of the coordination servers.
The database will remain operational as long as we
can connect to a quorum of servers, however the fault
tolerance of the system is reduced as long as the
servers remain disconnected.
172.17.0.2:4500 (reachable)
172.17.0.3:4500 (reachable)
172.17.0.4:4500 (unreachable)
Locking coordination state. Verify that a majority of coordination server
processes are active.
Why is this? Its single redundancy mode after all. Why is it unavailable when 2 of the 3 nodes are still up?