Current state of cluster - shows down, zero transactions. But I could see that all 24 nodes are accessible and fdb process is running on all nodes.
Issue - Cluster was down as log server filesystem got filled and hence we have deleted some of the logqueue-*.fdq files under /parent/data/4500/data directory and restarted the service, The same has happened on another node and we repeated the same activity. The cluster was up and was shown as rebalancing and all of a suddent it went down. Please suggest.
I have verified that all nodes have >50% of disk space for that partition and free RAM too. All I could see are connectivity errors in the log files when connectivity is not at issue at all. Wondering, does restarting fdb service on all node help? Is there a need to create database again?
Status details shows 7 coordinators node as reachable and below 2 messages:
Unable to locate the data distributor worker.
Unable to locate the ratekeeper worker.
Which FDB version are you using? Can you verify that ratekeeper is running (by searching logs and looking for Roles that has RK)? If Ratekeeper is missing, no transaction can start.
If the problem is indeed of missing Ratekeeper, you may want to add stateless processes. Ratekeeper is automatically recruited (by ClusterController in 7.1), and I don’t expect missing Ratekeeper to happen. So it’s best to rule it out first. Maybe it’s just network issue?
Restarting all processes are safe to do.
You’ve deleted transaction log files out from underneath FoundationDB, and transaction logs from old TLogs generations aren’t re-replicated on machine loss if they’re removed (as they’re expected to be transient anyway). If you deleted transaction log files from enough of the TLogs that you hit your replication factor for TLogs, then you’ve basically just put your cluster into an unrecoverable state by removing all copies of some piece of data. Your only way forward from that point would be to restore the .fdq files, if you still have them, or to restore the cluster from a backup.