Fdb database is unavailable

bharathrani1980 · August 29, 2023, 6:28pm

Current state of cluster - shows down, zero transactions. But I could see that all 24 nodes are accessible and fdb process is running on all nodes.

Issue - Cluster was down as log server filesystem got filled and hence we have deleted some of the logqueue-*.fdq files under /parent/data/4500/data directory and restarted the service, The same has happened on another node and we repeated the same activity. The cluster was up and was shown as rebalancing and all of a suddent it went down. Please suggest.

I have verified that all nodes have >50% of disk space for that partition and free RAM too. All I could see are connectivity errors in the log files when connectivity is not at issue at all. Wondering, does restarting fdb service on all node help? Is there a need to create database again?

Status details shows 7 coordinators node as reachable and below 2 messages:
Unable to locate the data distributor worker.
Unable to locate the ratekeeper worker.

jzhou · August 29, 2023, 9:58pm

Which FDB version are you using? Can you verify that ratekeeper is running (by searching logs and looking for Roles that has RK)? If Ratekeeper is missing, no transaction can start.

If the problem is indeed of missing Ratekeeper, you may want to add stateless processes. Ratekeeper is automatically recruited (by ClusterController in 7.1), and I don’t expect missing Ratekeeper to happen. So it’s best to rule it out first. Maybe it’s just network issue?

Restarting all processes are safe to do.

alexmiller · August 30, 2023, 12:16am

You’ve deleted transaction log files out from underneath FoundationDB, and transaction logs from old TLogs generations aren’t re-replicated on machine loss if they’re removed (as they’re expected to be transient anyway). If you deleted transaction log files from enough of the TLogs that you hit your replication factor for TLogs, then you’ve basically just put your cluster into an unrecoverable state by removing all copies of some piece of data. Your only way forward from that point would be to restore the .fdq files, if you still have them, or to restore the cluster from a backup.

Topic		Replies	Views
Database unavailable after shutting down a foundationdb node Using FoundationDB	17	8545	February 5, 2021
Fdb cluster is unavailable after delete a disk Using FoundationDB	3	1147	July 9, 2020
Recovery from lost all transaction node Using FoundationDB	2	512	January 16, 2022
SharedTLogFailed: internal_error Using FoundationDB performance	5	415	October 25, 2023
Stuck in "Initializing new transaction servers and recovering transaction logs" Using FoundationDB	7	1088	April 27, 2020

Fdb database is unavailable

Related topics