FoundationDB

Multi-region configuration - spreading copies inside a region across AZs


(Adam Kocoloski) #1

Hi folks, I’ve been digesting the region configuration docs and am hoping to check my understanding. How would you recommend configuring an FDB cluster to achieve the following?

  1. synchronous replication of transaction logs to two nearby “fault domains”
  2. asynchronous replication to a distant “fault domain”
  3. local replicas of each key in three nearby “fault domains”
  4. at least one replica in the distant “fault domain”

I’m specifically using “fault domain” here because I think I might be struggling a bit with the semantics of regions, datacenters, and data halls and the way those map to concepts in public cloud infrastructure. The example given in the docs covers requirements 1, 2, and 4, but then we bump into the fact that there can only be one primary datacenter in a region. If that datacenter fails then we need to failover the entire region.

The documentation suggests to map “datacenter” to “availability zone” in cloud infrastructure, but what about treating an AZ as a data hall? It seems to me that this would allow something like

  • configure each region with one datacenter
  • use three_data_hall redundancy with the AZ as locality_data_hall
  • omit the satellite_redundancy_mode setting altogether

which would hit all 4 requirements. Does that make sense? Or is it a design point of FoundationDB’s region support saying “if an availability zone fails, we’re getting the heck out of the region ASAP”

One other related question: is it possible to set a different redundancy mode in the “backup” region’s primary datacenter? I see some reference in the code to parameters like remote_redundancy_mode; is that relevant here?


(Evan Tschannen) #2

This setup does make sense, and should work.

As you were saying, in your region configuration you would have two regions each with only one datacenter. Processes within those two regions all have the same “datacenter”, however you give them a different “data_hall” based on their availability zone.

It is currently not possible to have different numbers of replicas in each region. This is something that should be supported when we implement generic region configurations.

One huge warning is that the simulator does not currently test this configuration. I can try testing this setup if you are going to deploy using this configuration.


(Adam Kocoloski) #3

OK got it, thanks Evan.

Fixing the number of replicas to be the same in both regions is fine for now. Does it specifically mean that we would need to run three_data_hall in both regions in this case?

On the simulation front … is it just the combination of three_data_hall and multi-region that is not tested? At the moment I would say this seems like an attractive option for us, although the fact that it wasn’t already in the simulator makes me wonder if I’m missing a downside :slight_smile: We’re still in the design phase at the moment but will definitely keep this detail in mind as we get closer to deployment.


#4

Evan, could you please add this configuration (three_data_hall + multi-region) to simulator?

Since three_datacenter redundancy mode is not compatible with region configuration, this would be a good configuration for our deployment since it allows using 3 availability zones as “data halls” using three_data_hall redundancy mode and another region with the same setup as a backup with automatic failover.