how to remove a node from fdb cluster ? for example if a machine from the cluster gets terminated from my aws cloud . and it happens to be one of the co-ordinator .how to deal with this situation
its a 3 node fdb cluster
See removing a machine from a cluster for the instructions on how to safely remove a machine. As mentioned in there, exclude
doesn’t relocate coordinators, so you’ll need to explicitly change coordinators.
As it sounds like you’ve already lost a machine in your cluster, I’ll assume that with a three node FDB cluster, you were using double replication and had three coordinators. Then you should be able to just add another machine and change to having a new set of three available coordinators, and data distribution should take care of restoring two full copies of all of your data.
thanks for your immediate reply, so here is the series of steps i assume
3 node cluster – all 3 are coordinators
1 node lost
change coordinators in existing nodes accordingly
spin new node with all packages
how will the new node join the existing 2 nodes ?
I think the easiest and safest ordering would be to:
- Create and configure a new machine that will run the new coordinator, but don’t start it yet
- Change coordinators in existing nodes to include the new node
- Start the new coordinator node using the new cluster file written from one of the other two nodes, so it will both become a coordinator, and join the new cluster as a worker.
before the second point we need to change coordinators on 2 nodes to remove the dead node first correct ?
You could, and instead do:
- Change coordinators to the two still-live machines
- Create and configure a new machine that will run the new coordinator
- Start the new process with a cluster file for the current two coordinator cluster
- Change the coordinators to be all three machines
Which would work equally well. I just elided an extra step by suggesting that you could configure the coordinators only once to include the ip:port where a coordinator will be once you start it.
If you were to write automation to do this, I’d probably expect it to look like
- Create and configure a new machine, and start it to join the current cluster (that has a dead coordinator).
- Change the coordinators to only be the now three alive machines.
… which I now realize works equally well manually, and is even one more step shorter.
ok sure thanks, will try these
sorry one q on the point 1 of second method
- *ceate and configure a new machine, and start it to join the current cluster (that has a dead coordinator). . when i configure the new machine and start it, how will it join the current clluster ?? how will it know ?
Give it the same cluster file that the other two machines have. It’s going to be a cluster file where one of the coordinators is dead, but that’s fine, as 2/3 are alive and you still have quorum.
the 2nd method works perfectly fine, thanks a lot