I need to set the failure-domain do something that maps 1-2-1 with a k8s cluster, which doesn’t play nicely with the “three_data_hall” redundancy mode where it tries to create, for example, 9 coordinators across the failure domain.
Could you explain what you mean with 1-2-1?
I think the setup should already work with the operator, thought I haven’t tested it yet. You probably need some additional tooling to orchestrate the work across the three Kubernetes clusters. Something like this should work:
- Create the initial FDB cluster in the first Kubernetes cluster and set the data hall to
kubernetes1 (or maybe something better):
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
name: test-cluster-kc1
spec:
# The unified image supports to make use of node labels, so setting up a three data hall cluster
# is easier with the unified image.
imageType: unified
version: 7.1.63
faultDomain:
key: kubernetes.io/hostname
dataHall: kubernetes1
processGroupIDPrefix: kubernetes1
databaseConfiguration:
# Ensure that enough coordinators are available. The processes will be spread across the different zones.
logs: 9
storage: 9
redundancy_mode: "three"
This would bring up the initial FDB cluster.
- In the second step you could bring up the other processes in the other Kubernetes clusters:
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
labels:
cluster-group: test-cluster
name: test-cluster
spec:
# The unified image supports to make use of node labels, so setting up a three data hall cluster
# is easier with the unified image.
imageType: unified
version: 7.1.63
faultDomain:
key: kubernetes.io/hostname
dataHall: kubernetes2
processGroupIDPrefix: kubernetes2
seedConnectionString: $connectionString # from the kubernetes1 cluster
databaseConfiguration:
# Ensure that enough coordinators are available. The processes will be spread across the different zones.
logs: 9
storage: 9
redundancy_mode: "three"
Repeat the same for kubernetes3. Now you should have a triple replicated FDB cluster that spans across 3 Kubernetes clusters (assuming networking connectivity is given). All the processes should have the zonid set to their host and the data hall locality to the value of dataHall.
- The last step is to change the replication mode for all clusters concurrently to
three_data_hall:
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
name: test-cluster-kc1
spec:
# The unified image supports to make use of node labels, so setting up a three data hall cluster
# is easier with the unified image.
imageType: unified
version: 7.1.63
faultDomain:
key: kubernetes.io/hostname
dataHall: kubernetes1 # Make sure to update this
processGroupIDPrefix: kubernetes1 # Make sure to update this
databaseConfiguration:
# Ensure that enough coordinators are available. The processes will be spread across the different zones.
logs: 9
storage: 9
redundancy_mode: "three_data_hall"
Now you should have a FDB cluster that runs in three_data_hall that spans across 3 Kubernetes clusters, where each data hall is one Kubernetes cluster.
If things work, please feel free to open a PR in the operator repo with additional docs (or if something doesn’t work). fdb-kubernetes-operator/docs/manual/fault_domains.md at main · FoundationDB/fdb-kubernetes-operator · GitHub would be a good place with a new section “three data hall across 3 Kubernetes clusters”. The idea is basically the same as for the multi-dc setup: fdb-kubernetes-operator/config/tests/multi_dc at main · FoundationDB/fdb-kubernetes-operator · GitHub.