FDB Regions vs DR performance and usage for creating independent FDB clusters

Hello FDB experts,

Wanted to ask about FDB7.x Regions vs DR performance and whether one or the other (or a separate solution) can be best used for our use-case below.

We have a source FDB cluster in CloudA which we’d like to copy to machines in CloudB, and have both running independently until we’re ready to switch to CloudB serving traffic weeks later. Once the initial sync between CloudA and CloudB clusters is complete via FDB (we tested mostly with regions so far), we want to remain serving from CloudA. After the FDB data is copied, our application tier will have the responsibility to keep the 2 environment in sync rather than FDB itself.

Our questions so far are:

  1. Does FDB Regions use less resources than FDB DR? (i.e. do we need more machines/process count or process classes at source or destination to keep replication running alongside regular FDB usage)
  2. Can regions be used to copy a cluster and then make 2 independent clusters, as I attempted to describe above? (or DR or another solution)

For the switch from replicating via FDB vs our own ingest data path, we are able to stop ingestion at the source and start queuing up data to ensure consistency. We are planning to then disconnect the 2 clusters (stop replication via FDB), wait for each to report healthy, and resume ingestion of queued messages into CloudA and CloudB FDB clusters separately.

Also worth noting from our FDB Regions testing that we needed to disable perpetual_storage_wiggle (since we didn’t set perpetual_storage_wiggle_locality, and replication using regions started losing data once destination region with -1 priority was added and usable_regions were set to 2. Not sure if Configuration — FoundationDB 7.1 needs to be updated to mention wiggle, which I don’t think FDB7 enables by default, but we needed when switching from FDB6 ssd storage engine to FDB7 ssd-redwood-1-experimental.

Thank you,
Boz

Interesting questions. I’ll throw out my thoughts on them.

  1. FDB region won’t use less resources than FDB DR. In normal operations, the remote region doesn’t have read traffic (because read to primary region is faster), so you may be tempted to use less resource. However, when there is a problem in the primary region and the cluster needs to fail over, less resource can cause performance degradation. Thus, it’s not a good idea to use less resource in the remote. Additionally, FDB region requires a separate satellite site for storing log data, which is additional resource requirement.
  2. Region can give you almost synchronized data across two data centers. However, there are a few seconds lag. Theoretically you could make 2 independent clusters, but there are a couple of problems to solve:

a) The lag problem. Either pause writes (e.g., lock the database) or use application level logic to make sure data are identical across two regions.

b) Making separate clusters. You need to have separate cluster files, where each cluster has its own set of coordinators. These coordinators need to have the data from a majority of the original set of coordinators before the separation. There is no such tool existed today to do the above, so some manual testing is needed and then can be converted into an automation tool.