FoundationDB fdbdr

#1

Hello,

I am trying to run an fdbdr to copy data from a 3 node fdb cluster on aws to another 3 node fdb cluster also on aws.

I used the following command:
fdbdr start -d /home/xxx/fdb_dest.cluster -s /etc/foundationdb/fdb.cluster

The DR on tag `default’ was successfully submitted.

The status of the DR shows as follows:

fdbdr status -d /home/xxx/fdb_dest.cluster -s /etc/foundationdb/fdb.cluster

The DR on tag `default’ is a complete copy of the primary database.

The DR is 0.203956 seconds behind.

However when I try to query from the source fdb cluster or the destination fdb cluster using fdbcli I get “ERROR: Database is locked (1038)”.

My primary objective is to get a copy of the source fdb and then point my applications to the copy.

Thanks

(Alex Miller) #2

You need to stop the DR if you want to unlock and use the secondary cluster.

$ fdbdr abort -d /home/xxx/fdb_dest.cluster -s /etc/foundationdb/fdb.cluster

And then you should be able to use fdb_dest.cluster as normal.

1 Like
#3

Thanks for the quick response. I did that.

$ fdbdr abort -d /home/xxx/fdb_dest.cluster -s /etc/foundationdb/fdb.cluster

The DR on tag `default’ was successfully aborted."

After that when I do status

$fdbdr status -d /home/xxx/fdb_dest.cluster -s /etc/foundationdb/fdb.cluster
The previous DR on tag `default’ has been partially aborted.
Abort the DR with --cleanup before starting a new DR.

The DR is 4.735403 seconds behind."

Why does it say that the DR on tag default was partially aborted?

(A.J. Beamon) #4

The abort command that Alex gave only stops the DR on the secondary cluster and is what you would use if the primary weren’t available. To cancel the DR activity on the primary as well, there is an additional step required to stop mutation logging, and you can do this by running the abort command with the --cleanup argument.

1 Like
#5

Hi AJ,

Thanks for explaining!

#6

I had another question. From the current exercise, it seems that if I have cluster A and cluster B, and I DR from cluster A to cluster B. At that point both cluster A and cluster B will be unavailable for read or write (locked). Hence whenever I run DR both the clusters A and B will be unavailable and would mean a down time for the application using the clusters. Is my understanding correct or am I missing something.

Thanks

(A.J. Beamon) #7

Only the secondary should be locked, the primary should not be. Is this behavior repeatable?

#8

I could not repeat it. I tried on a different cluster and the source cluster could be queried fine when the DR was going on. Not sure how I got into that state earlier. Thanks!