How to redistribute fdb processes roles in the existing cluster


I have a three node fdb cluster. Each node runs two fdbserver processes without process class specifying in foundationdb.conf. There are some data in the database.

Now I want to add more three nodes with two processes on each, but I’d like to run only stateless and transaction process classes on the new nodes (one stateless and one transaction per node). Then I want to keep only storage processes on the old three nodes (two storage processes on each node).

I can build a new empty six-node cluster with the desired role distribution from scratch. But I’d like to reconfigure the existing cluster without data loss. Is it possible? If so, where can I find any step-by-step instruction for this kind of reconfiguration?

You should be able to add the new processes to your cluster and configure the processes classes as desired on them. At that point, I believe the cluster will automatically adjust its recruitments to place the stateless processes and transaction logs on your new hosts. Or if it doesn’t seem to happen automatically, restarting the cluster should force it to do a recovery.

You can then change the process class of your existing processes to storage. Technically I think you could probably do this at anytime and the cluster would continue to work (storage processes can be used for other purposes if there are no alternatives). If you intend to do this one at a time rather than all at once, it may be a little better to do it second just to avoid some sub-optimal decisions with recruitment of non-storage processes.

Thanks a lot. I managed to do it.

At first I changed the class of existing processes from unspecified to configuration. The cluster continued working.

At second I added stateless and transaction processes on the new hosts. Seems the logs are moved to the new processes automatically because their files appeared on the new hosts and disappeared from the old ones. Unfortunally I could not discover where were proxies because they do not have files.

At third I changed coordinators from the old processes to the new stateless with the coordinator command. The new coordinator state files appeared on the new hosts but did not disappear from old ones. status detail displayed the new list.

After restrarting the cluster the old coordinator state still existed. fdbcli status from any client machine complained that the list of coordinators in the cluster file is wrong. Seems coordinators worked on both new and old hosts, but status indicated only on the new ones.

Only when I stopped the cluster, removed the extra coordinator state files and started the cluster again everything became all right.

The coordination files will not get deleted by a coordinator change, but that should be ok. For the complaint about the cluster file from a client machine, that probably suggests that the cluster file still had the old connection string in it when you tried to connect and fdbcli was unable to update it. This could be because of insufficient write privileges on the cluster file.

Thank you for reply.

How can I make another reconfiguration: I have some processes without service_class specified. I want to add more transaction and storage processes and to convert existing processes to be stateless?

I believe the most straightforward way to do this would be to add your new processes with the configured classes and either simultaneously or afterward change the class of your existing processes to stateless.

The reason this works is that the stateless class processes will still recover their data files and operate as storage servers when they start up. They will be marked as unwanted storage servers, and so the cluster will quickly start the process of moving the data away from them to other more desirable storage servers. When this is complete, the storage role will be terminated. For the transaction logs, new ones will be recruited elsewhere and will recover from the old ones located on the now stateless processes.

If you wanted to be more conservative, you could potentially first exclude the unset processes, wait for that to complete, and then switch them to stateless. If you do this all at once, your stateless processes would end up on your new processes in a configuration that may not be what you want. You could instead do it one-by-one, or you could add new stateless processes and just remove the unset ones.

Great. Yes, it works!

The simplest way if to restart the cluster with the new configuration files. Because all other processes present in the new configuration, it starts to work even their service_class has been changed.

Because the old processes become stateless, no coordinator change is required.

If you wanted to be more conservative, you could potentially first exclude the unset processes, wait for that to complete, and then switch them to stateless .

First I tried this way, but I didn’t manage to make this to work. After I excluded the old processes, I didn’t have an option to include them again even with another service_class.

So I found that the automatic conversion is easier.