Three_data_hall coordinators

stefanvasilic4 · November 16, 2023, 9:07pm

Hi @johscheuer , how did you manage to get 9 coordinators for three_data_hall ? ( Initial support for three data hall replication by johscheuer · Pull Request #1651 · FoundationDB/fdb-kubernetes-operator · GitHub ) every time i test i get 1 ?

johscheuer · November 17, 2023, 6:50am

Could you share some more information about your setup? Without any information about what operator version you use and how your FoundationDBCluster resources look like it’s hard to help you.

stefanvasilic4 · November 17, 2023, 1:32pm

i want to achieve three_data_hall across 3 AZ (cloud). I have nodes labeled with topology.kubernetes.io/zone=<respective_AZ> and following https://github.com/FoundationDB/fdb-kubernetes-operator/tree/main/config/tests/three_data_hall for deployment. Also locality is set

localities:
  - key: "data_hall"
    value: $az

with initial triple cluster i get a default 5 coordinators, when changes to three_data_hall goes to 1 instead of 9. what am i missing ?
thank you

johscheuer · November 17, 2023, 3:55pm

Are you able to actually share the FoundationDBCluster resource? How many nodes do you have per AZ? You need at least 3 nodes per AZ, otherwise the operator is not able to select the right amount of coordinators.

localities:

key: “data_hall”
value: $az

Shouldn’t the $az be replaced with the actual value (not sure where this information is from)? The docs have some additional information about the setup: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/manual/fault_domains.md#three-data-hall-replication

stefanvasilic4 · November 17, 2023, 4:48pm

please ignore localities section , is not working with new api apps.foundationdb.org/v1beta2 ( it comes from https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/three_datahall.md#coordinator-selection);
back to initial problem, i do have 50 nodes available, and pods are scheduled correctly on respective nodes. fdbcluster definition i am using at the moment is https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/config/tests/three_data_hall/final.yaml

env vars:

AZ1=${AZ1:-"eastus2-1"}
AZ2=${AZ2:-"eastus2-2"}
AZ3=${AZ3:-"eastus2-3"}

observations:

in triple mode , i have 5 logs and 5 storage and 5 coordinators
when it switches to three_data_hall it shrinks , i have 3 clusters with 3 logs , 1 storage each and 1 coordinator
nothing obvious in operator pod logs

johscheuer · November 21, 2023, 10:44am

What operator version is deployed in your case? And have you made sure you use the correct CRD deployed from the according release branch (or newer)? I just tested the scripts/test setup that you referenced and everything works fine:

$ fdbcli --exec 'status details'

Using cluster file `/var/dynamic-conf/fdb.cluster'.

Configuration:
  Redundancy mode        - three_data_hall
  Storage engine         - ssd-2
  Coordinators           - 9
  Desired Commit Proxies - 2
  Desired GRV Proxies    - 1
  Desired Resolvers      - 1
  Desired Logs           - 4
  Desired Remote Logs    - -1
  Desired Log Routers    - -1
  Usable Regions         - 1

...

Coordination servers:
  192.168.0.3:4501  (reachable)
  192.168.0.4:4501  (reachable)
  192.168.0.5:4501  (reachable)
  192.168.0.6:4501  (reachable)
  192.168.0.23:4501  (reachable)
  192.168.0.9:4501  (reachable)
  192.168.0.11:4501  (reachable)
  192.168.0.101:4501  (reachable)
  192.168.0.102:4501  (reachable)

Is there anything interesting in the operator logs? Are you able to share them?

stefanvasilic4 · November 21, 2023, 2:32pm

i updated crds but missed to update operator image version. Thank you so much for help!

Topic		Replies	Views
Incorrect clusterfile coordinators when using three_data_hall? Running FoundationDB operator	12	235	July 4, 2024
Example on setting up triple_data_hall with FDB operator Kubernetes Operator operator	7	717	November 19, 2021
Operator release 1.27 supports three_data_hall Kubernetes Operator	0	256	October 24, 2023
Compare triple_data_hall vs multiple coordinators Running FoundationDB operator	2	370	November 17, 2021
Feedback on New Deployment Topology Running FoundationDB	4	479	January 13, 2022

Three_data_hall coordinators

Related topics