What is the right way of making a copy of existing cluster

Hello!

I have a working 3-node fdb cluster with some data. Sometimes I need to make a copy of this cluster: a cluster on another three nodes (with another IP addresses) containing the same data.

I’ve regarded the following ways:

  1. Backup all data from the first cluster with fdbbackup and restoring them to the new one with fdbrestore.

  2. Creating a dr site of the source cluster on the new cluster with fdbdr and then stopping the replication after finishing of copying data.

Both ways allows to create a copy of the source cluster but take a lot of time for large databases.

I’m looking for a faster alternative of making such copy.

My idea is shutting down the whole source cluster (sometimes it is acceptable), copy the contents of the storage directories from the source nodes to the same directories of the destination nodes and then starting up both clusters. Copying files performs much faster then extracting/inserting key-value pairs.

This can be done extremaly fast if the cloud or the virtual platform allows to make a linked copy of data volumes. In this case I can shutdown the source cluster, make a snapshot of their data volumes and start the source cluster up. Then I can create clone of the snapshot as a linked copies (it is done with the constant time because there is no any real copying of data) and use them on the destination cluster.

The problem is that when I’m trying to use data files from one cluster on another, fdb does not see the “stolen” data:

[oleg@oleg2 devops]$ fdbcli -C standby-fdb.cluster
Using cluster file `standby-fdb.cluster’.

The database is unavailable; type `status’ for more information.

Welcome to the fdbcli. For help, type `help’.
fdb> status details

Using cluster file `standby-fdb.cluster’.

The coordinator(s) have no record of this database. Either the coordinator
addresses are incorrect, the coordination state on those machines is missing, or
no database has been created.

192.168.56.81:4500 (reachable)
192.168.56.82:4500 (reachable)
192.168.56.83:4500 (reachable)

Unable to locate the data distributor worker.

Unable to locate the ratekeeper worker.

Is there any way of enforcing fdb to use data files copied from another cluster?

Is there another alternative of fast cloning fdb clusters?

The error you get is indicating that the coordinators don’t have the data files associated with the database. As part of your procedure, you would need to copy the coordinators’ data files from your source cluster (coordination-*.fdq) to the processes you are choosing as the coordinators in the new cluster. Then you would need to copy the existing cluster file to all processes in the new cluster, except with the list of coordinators updated to the new processes where you copied the files.

If you do all that (in addition to copying all other files properly), I believe the cluster should work as expected.

There is a feature that supports fast backups via volume snapshotting, though I honestly know very little about it or how to use it. Maybe someone else can chime in to describe whether it’s ready for use and how it works.

Thanks, ajbeamon.

I’ve tried it, it works great!

My problem was
that I’d copied all data files including the coordinators’ data files, but I used the new cluster file with a new cluster id.

The working procedure:

  1. Shut both source and destination clusters down completely
  2. Copy all contents of the data dirs
  3. Merge cluster files from two clusters into the new one. Take the first part (cluster description and cluster id) from the source cluster and the second part (the list of coordinator processes) from the destination cluster. Use it as the new destination cluster file.
  4. Start both source and destination clusters up

The only problem is there are two clusters with the same cluster id. How can I change it for the clone-cluster?

1 Like

Ah yes, good question. In fdbcli, you can use the coordinators command to set what is called the description field in the cluster file, which will also result in the ID field also being regenerated (see cluster file format documentation). To change it, you would execute:

fdb> coordinators description=<new_description_string>

Below is a sequence to demonstrate the effect:

aj@ubuntu:~$ cat /etc/foundationdb/fdb.cluster
# DO NOT EDIT!
# This file is auto-generated, it is not to be edited by hand
old_description:m57KiSJG12bBhcxB1uRTcWn9fiUcO63X@127.0.0.1:4500
aj@ubuntu:~$ fdbcli
Using cluster file 'fdb.cluster'.

The database is available.

Welcome to the fdbcli. For help, type 'help'.
fdb> coordinators
Cluster description: old_description
Cluster coordinators (1): 127.0.0.1:4500
Type 'help coordinators' to learn how to change this information.
fdb> coordinators description=new_description
Coordination state changed
fdb> 

aj@ubuntu:~$ cat fdb.cluster
# DO NOT EDIT!
# This file is auto-generated, it is not to be edited by hand
new_description:9LVXRhzH3lXH7rIDD6OOUrnHSLgDCUB7@127.0.0.1:4500

Can I submit my cluster Id instead of regenerating it?

fdb> coordinators description=<new_description_string> id=<new_cluster_id>

No, and it will change anytime you change your coordinators list or description.

Do we change log format between 6.2 and 6.3? I want to know can we use this method in two different versioned cluster, such as 6.2 to 6.3, or 6.2 to 7.1? Thanks!

Changing the version in this process would essentially be an upgrade or downgrade and should behave the same as if you upgraded or downgraded in place.