If you remove
Replication Factor
machines or more from a cluster without excluding them first, and waiting for the exclude to finish, then you’re going to break your cluster, because there’s data (including system metadata) that will be permanently missing.
I didnt remove any machines, I removed processes! Because when looking at the docs, I dont see how “1 machine 1 process” suddenly shouldnt be a workable state just because I temporarily increase the amount of processes and then remove them again.
In my understanding I dont have any replication here to begin with. If I am violating something then it is invisible to me and the docs dont mention it.
I dont see any indication that I’m violating something called a ‘Replication Factor’ when reading this: Configuration — FoundationDB 7.1
I’m confused though that
fdbcli> configure single ssd
shouldn’t bring you back to a working cluster. Runningfdbcli> configure new single ssd
and thus throwing away the previous database might? Did you happen to elide thenew
by accident when posting, or should I go think harder?
well today I tried it again and now it seems completely arbitrary to me. Sometimes it locks up even if I just remove 1 of 4 processes. Sometimes only when I remove 2. It definitely always happen when I remove 3 at the same time.
I made two videos to show the behavior: (at first it seems fine, processes just say “no metrics available” but seconds later the whole thing goes into error mode)
https://webm.red/Fyct
second video I cant believe that it’s working again at first but then I try removing 2 at the same time and it goes into error mode again:
https://webm.red/u80p
I cant replicate the behavior of the ‘configure single ssd’ thing again, I do believe that it happened just like I said.