I’m trying to create a two datacenter configuration, each with 3 fdb nodes.
My requirements:
Not more than two datacenters exist
Active/Passive. Under normal conditions the first datacenter operates data and the second keeps a replica
When the whole second datacenter fails, the first datacenter should continue working without any downtime. Some performance penalty is acceptable.
When the whole first datacenter fails, there should be a capability to activate a second datacenter to operate data. Some downtime, a small data loss and a manual reconfiguration are acceptable.
Capability of switching roles of two datacenters for maintenance without any data loss. Small downtime and a manual reconfiguration are acceptable.
My first approach was to build a DR cluster. This solution satisfies all 5 requirements but there are 2 problems
The DR solution has a performance penalty because all mutations need to be written to the system keyspace that doubles the writing volume.
I tried to use a suggested multi-region configuration with two regions, each having a single datacenters. I used six coordinator processes: three in each datacenters. But this configuration didn’t satisfy requirements 3 and 4: when any datacenter failid, three coordinator processes were not enouth for continuing work. Seems multi-region configuration becomes useful only with three and more datacenters that contradicts the requirement 1.
Any assymetric configuration (4 + 3 coordinators) does not survive when the datacenter with most coordinators fails.
This is because if an entire region fails, it is still possible to recover to the other region if you are willing to accept a small amount of data loss. However, if you have lost a majority of coordinators, this becomes much more difficult.
But I cann’t find any step-by-step information, how to recover a fdb cluster when majority of coordinators are not available. Is it ever feasible?
This is theoretically feasible, but not implemented. It’s been discussed as on the roadmap before, and I’d suggest @markus.pilman as perhaps a good person to talk about when that would be implemented.
Assuming you cannot hide a coordinator in any third region somewhere, then for now, the “correct” way to do this would be to use two clusters and DR precisely as you outlined above. I think I’d say more “deprecated” than “obsoleted”, as I hadn’t heard of a removal of it planned yet, but @mengxu is welcome to correct me if I’m wrong. The double write penalty is just something you’d have to live with until the better solution becomes available.
This is theoretically feasible, but not implemented. It’s been discussed as on the roadmap before, and I’d suggest @markus.pilman as perhaps a good person to talk about when that would be implemented.
how to recover a fdb cluster when majority of coordinators are not available. Is it ever feasible?
This is theoretically feasible, but not implemented.
I’ve managed to recover my fdb cluster when the majority of coordinators was lost.
Initial state: two datacenters: a primary and a remote were in two regions. Four coordinators: three were in the primary and one was in the remote.
The full primary datacenter failed.
Steps to recover:
Stop foundation db in the secondary datacenter
Modify the cluster file to have three coordinators in the second datacenter
Copy the coordination-* files from the initial coordinator in the second datacenter to the new two ones.
That will work, but it’s possible that it won’t recover, if it does you’ll lose an unbounded amount of data, and theoretically open yourself to database corruption.
A recovery (or more than one) could have happened, and written the new coordinated state to only the three coordinators in the primary. Your coordinated state in the secondary is thus stale, and doesn’t know that it is stale. When you copy it to more coordinators to get back to having a quorum, you’re restoring a stale coordinated state. It’s possible that it points only to transaction log instances that no longer exist, and thus recovery will block forever. It’s possible that it points to a subset of the older transaction log instances that do exist, and then you’ll lose all data written in the newer generations of transaction logs (but it will still be a consistent snapshot).
It’s also possible that the primary half of the database could come back online that isn’t aware of your manual coordinator changes, and then you’d have two FDB clusters both trying to use the same transaction logs, which will probably have very weird behavior.
So it will work, but there’s a lot of caveats, and is why #2022 exists to provide a safe® way of doing such an operation.
Yes, this scenario is not safe. For safity I’d add a step
6. Prevent the primary datacenter from starting up
But sometimes splitting a cluster to two independent parts is a desired goal. For example, when I want to create a full copy of data from a working cluster for testing.
Earlier I was using a DR cluster for cloning. But seems making this with multiregion configuration is also possible.