A FDB cluster was unavailable, the messages info shows “RecoveryStoppedTooManyOldGenerations”. We tried to add the --knob_max_generations_override parameter, but the issue is still there, just the "OldGenerations"="100" changed to "OldGenerations"="102".
Any thoughts that we can bring this cluster back to healthy?
"messages" : [
{
"description" : "RecoveryStoppedTooManyOldGenerations at Thu Mar 21 18:02:31 2024",
"name" : "process_error",
"raw_log_message" : "\"Severity\"=\"40\", \"Time\"=\"1711044151.846410\", \"Type\"=\"RecoveryStoppedTooManyOldGenerations\", \"ID\"=\"0000000000000000\", \"OldGenerations\"=\"100\", \"Reason\"=\"Recovery stopped because too many recoveries have happened since the last time the cluster was fully_recovered. Set --knob_max_generations_override on your server processes to a value larger than OldGenerations to resume recovery once the underlying problem has been fixed.\", \"Backtrace\"=\"addr2line -e fdbserver.debug -p -C -f -i 0x19d19fc 0x19d11f8 0x19d12c1 0xf33cce 0xf363ff 0x6be078 0xf56503 0xf56673 0xa01890 0xa01ff3 0x9fbdf8 0x9fbfba 0x9fb2a8 0x9fb75a 0x9fb8e6 0x6be078 0x732d1e 0x6be078 0x9fb2a8 0x9f1ac3 0x9fb2a8 0x9f1f15 0x9fb2a8 0x9f2bad 0x9fb2a8 0x9f0c51 0x9fb2a8 0xa0438b 0x19212db 0x19213d5 0x7fbad0 0x1a0f3d0 0x6746e2 0x7fec95c4b083\", \"Machine\"=\"xx.xx.xx.xx:4300\", \"LogGroup\"=\"default\", \"Roles\"=\"MS\"",
"time" : 1711040000,
"type" : "RecoveryStoppedTooManyOldGenerations"
}
],