How to avoid database partionning while changing redundancy mode from single/1 machine to double/3 machines

Hello :wave:

Context:

I plan to change the redundancy of an in-production 6.3.23 FDB database from single on one machine to double with 3 machines :rocket:.

I did some tests locally with docker containers and everything worked fine so far :+1:

Still, I have concerns regarding the changing of redundancy.

Problem:

First, I have to add additional machines to the cluster before changing the redundant mode.

Indeed, if running fdbcli --exec "configure double" without adding 2 machines, an error Not enough processes exist to support the specified configuration is raised.

As a result, I must add these machines first. But the consequence of adding additional machines when a database is in a single redundant mode is that FDB should start partitioning the data across these machines :face_with_raised_eyebrow::

single mode will work with clusters of two or more computers and will partition data for increased performance but the cluster will not tolerate the loss of any machines.

In my opinion, it is unnecessary because I intend to change the redundant mode to double.

Also, I don’t know how FDB reacts to fdbcli --exec "configure double" while already partitioning a large database :grimacing:.

Questions:

1/ Is there a way to avoid the partitioning of the database between the addition of extra machines and the redundancy configuration change?

2/ What are the risks of changing the redundancy configuration when FDB is partitioning the data through the just added extra machines?

Thanks to all of you who take the time to help :pray:.

I think the solution is to use a hidden command to disable the data distributor
datadistribution off. Then enable datadistribution after you add 2 machines.

Code reference: fdbcli.actor.cpp - apple/foundationdb - Sourcegraph

1 Like

And for the second question, “What are the risks of changing the redundancy configuration when FDB is partitioning the data through the just added extra machines?”.
I think the risk you’ll have is to waste some read/write bandwidth on the src/dest machine. The data distributor itself has a limited quota to do data movement, if you can pay for extra 600Mb/s (>50*100M, here 50 is the parallelism limit and 100M is the max shard size) moving for a while, I think it’s fine.

@sfc-gh-xwang Thank you for your detailed answer :+1::pray:.

I finally had the time to give it a try :sweat_smile:, and indeed fdbcli --exec "datadistribution off" works like a charm.

I run it before the 2 machines join my cluster. Then, as soon as I update the redundancy and the include additional coordinators, I run `fdbcli --exec “datadistribution on”.

Thank you for your second comment too! knowing that I’ll add the nodes when there is low traffic on our platform, and considering less than 100Go of data, I think it could be negligible indeed.