I’m trying to understand DR in foundationdb.
I’m looking to understand how to configure things for the following scenario.
Colo facility (Primary):
This is where our main fdb cluster runs and where all active systems operate
AWS (DR site):
This is where I want to run a DR cluster
Getting DR up and running and syncing to the DR site is not an issue. What I’m failing to understand is how to fail over from one site to another in case of the entire primary facility going offline. eg worst-case scenario a hurricane knocking a facility offline for an extended period
The documentation states that to ‘switch’ both clusters need to be online/accessible. In testing this proves to be true. I’ve tested this with a simple single → single DR setup, and then stopping the primary. Attempts to use fdbdr switch
and fdbdr abort
both just hang, and the DR server remains locked and unusable.
So my questions are really:
- Is there a “blessed” method for manually disabling the DR process and unlocking the DR cluster when only the DR cluster is accessible?
- What would happen should the inaccessible cluster become available again should we be able to do this? Will it just start pushing data to the now-live DR site resulting in corruption?
Maybe there is a blog post or tutorial I’ve been unable to find which covers DR better than the documentation.
Thanks