Hi @johscheuer , how did you manage to get 9 coordinators for three_data_hall ? ( Initial support for three data hall replication by johscheuer · Pull Request #1651 · FoundationDB/fdb-kubernetes-operator · GitHub ) every time i test i get 1 ?
Could you share some more information about your setup? Without any information about what operator version you use and how your FoundationDBCluster resources look like it’s hard to help you.
i want to achieve three_data_hall across 3 AZ (cloud). I have nodes labeled with topology.kubernetes.io/zone=<respective_AZ>
and following https://github.com/FoundationDB/fdb-kubernetes-operator/tree/main/config/tests/three_data_hall for deployment. Also locality is set
localities:
- key: "data_hall"
value: $az
with initial triple cluster i get a default 5 coordinators, when changes to three_data_hall goes to 1 instead of 9. what am i missing ?
thank you
Are you able to actually share the FoundationDBCluster resource? How many nodes do you have per AZ? You need at least 3 nodes per AZ, otherwise the operator is not able to select the right amount of coordinators.
localities:
- key: “data_hall”
value: $az
Shouldn’t the $az
be replaced with the actual value (not sure where this information is from)? The docs have some additional information about the setup: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/manual/fault_domains.md#three-data-hall-replication
please ignore localities
section , is not working with new api apps.foundationdb.org/v1beta2
( it comes from https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/three_datahall.md#coordinator-selection);
back to initial problem, i do have 50 nodes available, and pods are scheduled correctly on respective nodes. fdbcluster definition i am using at the moment is https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/config/tests/three_data_hall/final.yaml
env vars:
AZ1=${AZ1:-"eastus2-1"}
AZ2=${AZ2:-"eastus2-2"}
AZ3=${AZ3:-"eastus2-3"}
observations:
- in triple mode , i have 5 logs and 5 storage and 5 coordinators
- when it switches to three_data_hall it shrinks , i have 3 clusters with 3 logs , 1 storage each and 1 coordinator
- nothing obvious in operator pod logs
What operator version is deployed in your case? And have you made sure you use the correct CRD deployed from the according release branch (or newer)? I just tested the scripts/test setup that you referenced and everything works fine:
$ fdbcli --exec 'status details'
Using cluster file `/var/dynamic-conf/fdb.cluster'.
Configuration:
Redundancy mode - three_data_hall
Storage engine - ssd-2
Coordinators - 9
Desired Commit Proxies - 2
Desired GRV Proxies - 1
Desired Resolvers - 1
Desired Logs - 4
Desired Remote Logs - -1
Desired Log Routers - -1
Usable Regions - 1
...
Coordination servers:
192.168.0.3:4501 (reachable)
192.168.0.4:4501 (reachable)
192.168.0.5:4501 (reachable)
192.168.0.6:4501 (reachable)
192.168.0.23:4501 (reachable)
192.168.0.9:4501 (reachable)
192.168.0.11:4501 (reachable)
192.168.0.101:4501 (reachable)
192.168.0.102:4501 (reachable)
Is there anything interesting in the operator logs? Are you able to share them?
i updated crds but missed to update operator image version. Thank you so much for help!