After yesterday’s outage of AWS two of five nodes have been replaced and now we have “Replication health - UNHEALTHY: No replicas remain of some data”.
Full output:
# fdbcli --exec 'status details'
Using cluster file `/etc/foundationdb/fdb.cluster'.
Configuration:
Redundancy mode - double
Storage engine - ssd-2
Coordinators - 5
Usable Regions - 1
Cluster:
FoundationDB processes - 10
Zones - 5
Machines - 5
Memory availability - 7.7 GB per process on machine with least available
Retransmissions rate - 2 Hz
Fault Tolerance - 0 machines
Server time - 06/11/21 11:25:10
Data:
Replication health - UNHEALTHY: No replicas remain of some data
Moving data - unknown
Sum of key-value sizes - unknown
Disk space used - 97.738 GB
Operating space:
Storage server - 61.7 GB free on most full server
Log server - 61.7 GB free on most full server
Workload:
Read rate - 27 Hz
Write rate - 1 Hz
Transactions started - 14 Hz
Transactions committed - 1 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Process performance details:
10.165.196.94:4500 ( 1% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.5 GB / 7.8 GB RAM )
10.165.196.94:4501 ( 1% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.6 GB / 7.8 GB RAM )
10.165.196.116:4500 ( 1% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.6 GB / 7.7 GB RAM )
10.165.196.116:4501 ( 1% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.4 GB / 7.7 GB RAM )
10.165.196.148:4500 ( 1% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.3 GB / 7.8 GB RAM )
10.165.196.148:4501 ( 1% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.4 GB / 7.8 GB RAM )
10.165.196.176:4500 ( 1% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.6 GB / 7.7 GB RAM )
10.165.196.176:4501 ( 1% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.4 GB / 7.7 GB RAM )
10.165.196.234:4500 ( 0% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.3 GB / 7.7 GB RAM )
10.165.196.234:4501 ( 1% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.4 GB / 7.7 GB RAM )
Coordination servers:
10.165.196.94:4500 (reachable)
10.165.196.116:4500 (reachable)
10.165.196.148:4500 (reachable)
10.165.196.176:4500 (reachable)
10.165.196.234:4500 (reachable)
Client time: 06/11/21 11:25:08
If I stop one node ‘Replication health’ changes to ‘(Re)initializing automatic data distribution’, then ‘Healthy’ and then after some time to ‘UNHEALTHY: No replicas remain of some data’
# fdbserver --version
FoundationDB 6.2 (v6.2.28)
source version 569ab46bf638cd0bfc86f192b724c9217e090760
protocol fdb00b062010001
Is there a way to fix it?