fdbcli> status json
includes the process class settings and role recruitments.
See https://pastebin.com/kj5XCNPM (from Are spikes of 500ms+ MaxRowReadLatency normal?) for an example.
"54a7a3995096944c1ecb563e81ff61d9" : {
"class_source" : "command_line",
"class_type" : "storage",
[snip]
"roles" : [
{
"role" : "storage",
[snip]
}
],
},
If you remove Replication Factor
machines or more from a cluster without excluding them first, and waiting for the exclude to finish, then you’re going to break your cluster, because there’s data (including system metadata) that will be permanently missing.
The recovery step of locking_coordinated_state
also waits for the previous generation of TLogs to come back, so that we can read out the system metadata. As you’ve removed >=Replication Factor
number of machines, that’s never going to finish.
(I’ve also been confused by this naming, so maybe we should go rename this step sometime…)
I’m confused though that fdbcli> configure single ssd
shouldn’t bring you back to a working cluster. Running fdbcli> configure new single ssd
and thus throwing away the previous database might? Did you happen to elide the new
by accident when posting, or should I go think harder?