Three_data_hall and cross k8s cluster?

GrantFleming · November 5, 2025, 8:11am

Is there a valid configuration when using fdb-operator for running in “three_data_hall” mode where each data hall is a separate kubernetes cluster?

The problem I’m having is that to run cross-k8s-cluster where each k8s cluster corresponds to a data-hall, I need to set the failure-domain do something that maps 1-2-1 with a k8s cluster, which doesn’t play nicely with the “three_data_hall” redundancy mode where it tries to create, for example, 9 coordinators across the failure domain.

Is this just something that isn’t supported or possible with the operator right now?

johscheuer · November 5, 2025, 11:18am

I need to set the failure-domain do something that maps 1-2-1 with a k8s cluster, which doesn’t play nicely with the “three_data_hall” redundancy mode where it tries to create, for example, 9 coordinators across the failure domain.

Could you explain what you mean with 1-2-1?

I think the setup should already work with the operator, thought I haven’t tested it yet. You probably need some additional tooling to orchestrate the work across the three Kubernetes clusters. Something like this should work:

Create the initial FDB cluster in the first Kubernetes cluster and set the data hall to kubernetes1 (or maybe something better):

apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: test-cluster-kc1
spec:
  # The unified image supports to make use of node labels, so setting up a three data hall cluster
  # is easier with the unified image.
  imageType: unified
  version: 7.1.63
  faultDomain:
    key: kubernetes.io/hostname
  dataHall: kubernetes1
  processGroupIDPrefix: kubernetes1
  databaseConfiguration:
    # Ensure that enough coordinators are available. The processes will be spread across the different zones.
    logs: 9
    storage: 9
    redundancy_mode: "three"

This would bring up the initial FDB cluster.

In the second step you could bring up the other processes in the other Kubernetes clusters:

apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  labels:
    cluster-group: test-cluster
  name: test-cluster
spec:
  # The unified image supports to make use of node labels, so setting up a three data hall cluster
  # is easier with the unified image.
  imageType: unified
  version: 7.1.63
  faultDomain:
    key: kubernetes.io/hostname
  dataHall: kubernetes2
  processGroupIDPrefix: kubernetes2
  seedConnectionString: $connectionString # from the kubernetes1 cluster
  databaseConfiguration:
    # Ensure that enough coordinators are available. The processes will be spread across the different zones.
    logs: 9
    storage: 9
    redundancy_mode: "three"

Repeat the same for kubernetes3. Now you should have a triple replicated FDB cluster that spans across 3 Kubernetes clusters (assuming networking connectivity is given). All the processes should have the zonid set to their host and the data hall locality to the value of dataHall.

The last step is to change the replication mode for all clusters concurrently to three_data_hall:

apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: test-cluster-kc1
spec:
  # The unified image supports to make use of node labels, so setting up a three data hall cluster
  # is easier with the unified image.
  imageType: unified
  version: 7.1.63
  faultDomain:
    key: kubernetes.io/hostname
  dataHall: kubernetes1 # Make sure to update this
  processGroupIDPrefix: kubernetes1 # Make sure to update this
  databaseConfiguration:
    # Ensure that enough coordinators are available. The processes will be spread across the different zones.
    logs: 9
    storage: 9
    redundancy_mode: "three_data_hall"

Now you should have a FDB cluster that runs in three_data_hall that spans across 3 Kubernetes clusters, where each data hall is one Kubernetes cluster.

If things work, please feel free to open a PR in the operator repo with additional docs (or if something doesn’t work). fdb-kubernetes-operator/docs/manual/fault_domains.md at main · FoundationDB/fdb-kubernetes-operator · GitHub would be a good place with a new section “three data hall across 3 Kubernetes clusters”. The idea is basically the same as for the multi-dc setup: fdb-kubernetes-operator/config/tests/multi_dc at main · FoundationDB/fdb-kubernetes-operator · GitHub.

GrantFleming · November 5, 2025, 12:55pm

So (and this is maybe where my understanding is missing) I thought that in order to have this cross-k8s-cluster reconciliation behave properly I had to have failure domains that corresponded to a cluster.

So instead of:

faultDomain:
key: ``kubernetes.io/hostname

We needed each individual fdb cluster object in each individual k8s cluster to be something like:

faultDomain:
key: <somekey>
value: zone-a
zoneIndex: 1
zoneCount: 3

faultDomain:
key: <somekey>
value: zone-b
zoneIndex: 2
zoneCount: 3

faultDomain:
key: <somekey>
value: zone-c
zoneIndex: 3
zoneCount: 3

But now, since the failure domain is an entire zone/kubernetes cluster (they are the same in our case) then the operator won’t necessarily spread pods over nodes within each zone, which makes it hard to satisfy 9 coordinators across the fault domain.

johscheuer · November 10, 2025, 9:48am

The idea of the zoneIndex was to use one Kubernetes as the zoneid, e.g. for a cluster with triple replication, where each Kubernetes cluster is a dedicated fault domain. This feature was added at a time where the operator was in its early stages and three_data_hall was not supported. With todays flexibility, I don’t we need this feature anymore and we probably should be deprecating it, given that it’s causing more confusion. In general for most cases the operator doesn’t case about the actual underlying infrastructure, as long as the connectivity is given. So you can run FDB clusters in different ways, across Kubernetes clusters or inside a single Kubernetes cluster.

If you have any recommendations regarding the documentation, please feel free to open a PR. I’m happy to review it

GrantFleming · November 10, 2025, 4:32pm

Ah understood now. Thanks you for the advice. If I get some free time this week I’ll try to pull together a PR with a documentation update suggestion.

Topic		Replies	Views
Operator release 1.27 supports three_data_hall Kubernetes Operator	0	302	October 24, 2023
Example on setting up triple_data_hall with FDB operator Kubernetes Operator operator	7	823	November 19, 2021
Three_data_hall coordinators Kubernetes Operator	6	359	November 21, 2023
Migrating a triple cluster to a three_data_hall cluster without unavailability Kubernetes Operator operator	5	188	January 14, 2025
Run FoundationDB cluster on multi Kuberbetes clusters Kubernetes Operator	20	2074	February 6, 2023

Three_data_hall and cross k8s cluster?

Related topics