Run FoundationDB cluster on multi Kuberbetes clusters

Hi all,

I want to run FoundationDB on multi Kubernetes clusters for fault tolerance and improve speed for application by deploy each kubernetes cluster on each aws/gcp region like us-east-2,ap-south-1,eu-west-3

I deploy and run FDB cluster by FDB operator on one Kubernetes cluser already.
To run on another K8s cluster, if I apply yaml file to new cluster, how can it synchronize data with current active cluster?
Or we have another way to run FDB cluster on multi K8s cluster and make sure data in those clusters are consistent?

Another question if I have current FDB cluster use storage engine is ssd-2, then I want to create new
K8s cluster and extend current FDB cluster on it with storage engine is memory, can I do it?

I’m really appreciate your support.

Do you run FDB multi-region HA setup, i.e., each region has one EKS cluster and correspondingly a FDB operator ?

I want to run FoundationDB on multi Kubernetes clusters for fault tolerance and improve speed for application by deploy each kubernetes cluster on each aws/gcp region like us-east-2,ap-south-1,eu-west-3

One requirement is that all Pods are able to communicate with each other. So you have to ensure that your network setup supports this.

I deploy and run FDB cluster by FDB operator on one Kubernetes cluser already.
To run on another K8s cluster, if I apply yaml file to new cluster, how can it synchronize data with current active cluster?
Or we have another way to run FDB cluster on multi K8s cluster and make sure data in those clusters are consistent?

This config example shows the basic ideas how to setup a FDB cluster across multiple Kubernetes clusters (each handled as a dedicated DC). In the future we have plans to make the operations of multi-dc (or multi-Kubernetes) FDB clusters easier with the operator. The import bit here is the following (we have an issue open in GitHub to document the HA setup better and probably it’s a good time to do that soon):

First you have to create a FDB cluster running in a single Kubernetes (you already have that) and you have to ensure that the configuration of that cluster uses a unique (for the FDB cluster) datacenter and a unique processGroupIDPrefix (you can use the same value there) https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/config/tests/multi_dc/final.yaml#L17 e.g. in your case you could call it us-east-2. Now you can update the config with the following changes:

Update the regions config of the database configuration in the current existing FoundationDBCluster (see: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/config/tests/multi_dc/final.yaml#L25-L45). You probably can copy the config there and just replace the DC IDs with the real names. A higher priority means the DC is preferred over a lower priority.

In addition update the seedConnectionString to the current connection string of the cluster (https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/config/tests/multi_dc/final.yaml#L21). You can get that by either fetching the status of the FoundationDBCluster or by using fdbcli.

Once this is done you can create the FoundationDBCluster resources in the other two Kubernetes clusters using the same configuration except for the processGroupIDPrefix and the dataCenter those should be replaced by your new values (the ones that you defined in the database configuration).

Depending on the size of the cluster (or actually the stored values) it might take some minutes/hours to synchronise all data between the clusters.

One heads up if you make any changes to the database configuration ones you have an HA cluster running you have to make those changes in all configurations (in all FoundationDBClusters) otherwise you have conflicting settings which could result in the different operator instances trying to reconfigure the DB the whole time.

Another question if I have current FDB cluster use storage engine is ssd-2, then I want to create new
K8s cluster and extend current FDB cluster on it with storage engine is memory, can I do it?

That currently not possible since the database configuration is from an FDB perspective defined as a global configuration. You might be able to use the storage wiggle mechanism or something like this to emulate this, but I don’t think those were create for such a use case.

Hi @johscheuer,
Thank for your answer.
My cluster now is running on local k3d ckuster and I am moving to aws eks cluster.
I use default fdb-kubernetes-operator deployment and default cluster-fdb from github. I change version to 7.1.15.
My cluster is created with default eksctl command

eksctl create cluster --name k8s-cluster \
                      --region us-east-2 \
                      --zones us-east-2a,us-east-2b,us-east-2c \
                      --nodes 3 \
                      --node-zones us-east-2c \
                      --instance-types=t3.xlarge \
                      --ssh-access \
                      --ssh-public-key k8s-key \
                      --managed \
                      --nodegroup-name k8s-cluster

I got error and I read log but it did not enough information to debug.
Here is my yaml file and error log.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fdb-kubernetes-operator-controller-manager
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: null
  name: fdb-kubernetes-operator-manager-role
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - events
  - persistentvolumeclaims
  - pods
  - secrets
  - services
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbbackups
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbbackups/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbclusters
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbclusters/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbrestores
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbrestores/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: null
  name: fdb-kubernetes-operator-manager-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: fdb-kubernetes-operator-manager-role
subjects:
- kind: ServiceAccount
  name: fdb-kubernetes-operator-controller-manager
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fdb-kubernetes-operator-controller-manager
    control-plane: controller-manager
  name: fdb-kubernetes-operator-controller-manager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fdb-kubernetes-operator-controller-manager
  template:
    metadata:
      labels:
        app: fdb-kubernetes-operator-controller-manager
        control-plane: controller-manager
    spec:
      containers:
      - command:
        - /manager
        env:
        - name: WATCH_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: foundationdb/fdb-kubernetes-operator:v1.12.0
        name: manager
        ports:
        - containerPort: 8080
          name: metrics
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 500m
            memory: 256Mi
        securityContext:
          allowPrivilegeEscalation: false
          privileged: false
          readOnlyRootFilesystem: true
        volumeMounts:
        - mountPath: /tmp
          name: tmp
        - mountPath: /var/log/fdb
          name: logs
        - mountPath: /usr/bin/fdb
          name: fdb-binaries
      initContainers:
      - args:
        - --copy-library
        - "7.1"
        - --copy-binary
        - fdbcli
        - --copy-binary
        - fdbbackup
        - --copy-binary
        - fdbrestore
        - --output-dir
        - /var/output-files/7.1.15
        - --init-mode
        image: foundationdb/foundationdb-kubernetes-sidecar:7.1.15-1
        name: foundationdb-kubernetes-init-7-1
        volumeMounts:
        - mountPath: /var/output-files
          name: fdb-binaries
      securityContext:
        fsGroup: 4059
        runAsGroup: 4059
        runAsUser: 4059
      serviceAccountName: fdb-kubernetes-operator-controller-manager
      terminationGracePeriodSeconds: 10
      volumes:
      - emptyDir: {}
        name: tmp
      - emptyDir: {}
        name: logs
      - emptyDir: {}
        name: fdb-binaries

Cluster

apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: fdb-cluster
spec:
  automationOptions:
    replacements:
      enabled: true
  faultDomain:
    key: foundationdb.org/none
  labels:
    filterOnOwnerReference: false
    matchLabels:
      foundationdb.org/fdb-cluster-name: fdb-cluster
    processClassLabels:
    - foundationdb.org/fdb-process-class
    processGroupIDLabels:
    - foundationdb.org/fdb-process-group-id
  minimumUptimeSecondsForBounce: 60
  processCounts:
    cluster_controller: 1
    stateless: -1
  processes:
    general:
      customParameters:
      - knob_disable_posix_kernel_aio=1
      podTemplate:
        spec:
          containers:
          - name: foundationdb
            resources:
              requests:
                cpu: 100m
                memory: 128Mi
            securityContext:
              runAsUser: 0
          - name: foundationdb-kubernetes-sidecar
            resources:
              limits:
                cpu: 100m
                memory: 128Mi
              requests:
                cpu: 100m
                memory: 128Mi
            securityContext:
              runAsUser: 0
          initContainers:
          - name: foundationdb-kubernetes-init
            resources:
              limits:
                cpu: 100m
                memory: 128Mi
              requests:
                cpu: 100m
                memory: 128Mi
            securityContext:
              runAsUser: 0
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: 16G
  routing:
    headlessService: true
  sidecarContainer:
    enableLivenessProbe: true
    enableReadinessProbe: false
  useExplicitListenAddress: true
  version: 7.1.15

And here is log

ubectl logs -f --all-containers --namespace default fdb-kubernetes-operator-controller-manager-558dff4978-jtf59
{"level":"info","ts":1674140978.2800398,"logger":"setup","msg":"Operator starting in single namespace mode","namespace":"default"}
{"level":"info","ts":1674140978.5891607,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1674140978.5906844,"logger":"setup","msg":"Moving FDB binary file","currentPath":"/usr/bin/fdb/7.1.15/bin/7.1.15/fdbcli","newPath":"/usr/bin/fdb/7.1/fdbcli"}
{"level":"info","ts":1674140978.5913315,"logger":"setup","msg":"Moving FDB binary file","currentPath":"/usr/bin/fdb/7.1.15/bin/7.1.15/fdbbackup","newPath":"/usr/bin/fdb/7.1/fdbbackup"}
{"level":"info","ts":1674140978.5913672,"logger":"setup","msg":"Moving FDB binary file","currentPath":"/usr/bin/fdb/7.1.15/bin/7.1.15/fdbrestore","newPath":"/usr/bin/fdb/7.1/fdbrestore"}
{"level":"info","ts":1674140978.5914366,"logger":"setup","msg":"Moving FDB library file","currentPath":"/usr/bin/fdb/7.1.15/lib/libfdb_c.so","newPath":"/usr/bin/fdb/libfdb_c_7.1.15.so"}
{"level":"info","ts":1674140978.592798,"logger":"setup","msg":"setup manager"}
{"level":"info","ts":1674140978.5933084,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
{"level":"info","ts":1674140978.6942077,"msg":"attempting to acquire leader lease default/fdb-kubernetes-operator...\n"}
{"level":"info","ts":1674140997.3442132,"msg":"successfully acquired lease default/fdb-kubernetes-operator\n"}
{"level":"info","ts":1674140997.3444355,"msg":"Starting EventSource","controller":"foundationdbcluster","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBCluster","source":"kind source: *v1beta2.FoundationDBCluster"}
{"level":"info","ts":1674140997.344499,"msg":"Starting EventSource","controller":"foundationdbcluster","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBCluster","source":"kind source: *v1.Pod"}
{"level":"info","ts":1674140997.344508,"msg":"Starting EventSource","controller":"foundationdbcluster","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBCluster","source":"kind source: *v1.PersistentVolumeClaim"}
{"level":"info","ts":1674140997.3445222,"msg":"Starting EventSource","controller":"foundationdbcluster","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBCluster","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":1674140997.3445284,"msg":"Starting EventSource","controller":"foundationdbcluster","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBCluster","source":"kind source: *v1.Service"}
{"level":"info","ts":1674140997.3445196,"msg":"Starting EventSource","controller":"foundationdbbackup","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBBackup","source":"kind source: *v1beta2.FoundationDBBackup"}
{"level":"info","ts":1674140997.344577,"msg":"Starting EventSource","controller":"foundationdbbackup","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBBackup","source":"kind source: *v1.Deployment"}
{"level":"info","ts":1674140997.3445826,"msg":"Starting Controller","controller":"foundationdbbackup","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBBackup"}
{"level":"info","ts":1674140997.3445604,"msg":"Starting Controller","controller":"foundationdbcluster","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBCluster"}
{"level":"info","ts":1674140997.3447385,"msg":"Starting EventSource","controller":"foundationdbrestore","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBRestore","source":"kind source: *v1beta2.FoundationDBRestore"}
{"level":"info","ts":1674140997.3447611,"msg":"Starting Controller","controller":"foundationdbrestore","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBRestore"}
{"level":"info","ts":1674140997.4456177,"msg":"Starting workers","controller":"foundationdbrestore","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBRestore","worker count":1}
{"level":"info","ts":1674140997.4456224,"msg":"Starting workers","controller":"foundationdbcluster","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBCluster","worker count":1}
{"level":"info","ts":1674140997.445709,"msg":"Starting workers","controller":"foundationdbbackup","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBBackup","worker count":1}
{"level":"info","ts":1674141143.9275608,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.updateStatus"}
{"level":"info","ts":1674141143.953133,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.updateLockConfiguration"}
{"level":"info","ts":1674141143.953202,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.updateConfigMap"}
{"level":"info","ts":1674141143.9655888,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.checkClientCompatibility"}
{"level":"info","ts":1674141143.9656744,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.deletePodsForBuggification"}
{"level":"info","ts":1674141143.9657261,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.replaceMisconfiguredProcessGroups"}
{"level":"info","ts":1674141143.9657629,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.replaceFailedProcessGroups"}
{"level":"info","ts":1674141143.965837,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.addProcessGroups"}
{"level":"info","ts":1674141143.9881136,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.addServices"}
{"level":"info","ts":1674141144.0116718,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.addPVCs"}
{"level":"info","ts":1674141144.10706,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.addPods"}
{"level":"info","ts":1674141144.4174266,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.generateInitialClusterFile"}
{"level":"info","ts":1674141144.4174857,"logger":"controller","msg":"Generating initial cluster file","namespace":"default","cluster":"fdb-cluster","reconciler":"generateInitialClusterFile"}
{"level":"info","ts":1674141144.4177673,"logger":"controller","msg":"Reconciliation terminated early","namespace":"default","cluster":"fdb-cluster","subReconciler":"controllers.generateInitialClusterFile","requeueAfter":15,"message":"cannot find enough running Pods to recruit coordinators. Require 3, got 0 Pods"}

Could you please give me some advise to debug to pass over this?
Thank again

That indicates that the Pods are not running, Please check with kubectl get Pods if they are running or not.

Hi @johscheuer ,

I described controller operator pod and another pods ( example: log pod), I don’t know why for log pod, it cannot be assigned to any node and doesn’t have ip.
Here are describe output

Controller pod

kubectl describe -n default pod fdb-cluster-cluster-controller-1 
Name:         fdb-cluster-cluster-controller-1
Namespace:    default
Priority:     0
Node:         ip-192-168-92-31.us-east-2.compute.internal/192.168.92.31
Start Time:   Sat, 21 Jan 2023 08:09:21 +0700
Labels:       foundationdb.org/fdb-cluster-name=fdb-cluster
              foundationdb.org/fdb-process-class=cluster_controller
              foundationdb.org/fdb-process-group-id=cluster_controller-1
Annotations:  foundationdb.org/last-applied-config-map: 9b36ec310c5b9ac26df65de5581651e2067da44baae55a624fd899e1be605ccd
              foundationdb.org/last-applied-spec: c9892c4e99cfc7117c6fce6eb10e5be74248ca5f1051a7d96b4909037153eac7
              foundationdb.org/public-ip-source: pod
              kubernetes.io/psp: eks.privileged
Status:       Running
IP:           192.168.64.111
IPs:
  IP:           192.168.64.111
Controlled By:  FoundationDBCluster/fdb-cluster
Init Containers:
  foundationdb-kubernetes-init:
    Container ID:  docker://1957b99a15b37fa8aa2cedf4e150b42e723aea39083ab273c3832ec18b3efeee
    Image:         foundationdb/foundationdb-kubernetes-sidecar:7.1.25-1
    Image ID:      docker-pullable://foundationdb/foundationdb-kubernetes-sidecar@sha256:4dd174fc0fb3f900818e8a556ee27d218ed4ea810df454bd86d869ec450a3d7d
    Port:          <none>
    Host Port:     <none>
    Args:
      --copy-file
      fdb.cluster
      --input-monitor-conf
      fdbmonitor.conf
      --copy-binary
      fdbserver
      --copy-binary
      fdbcli
      --main-container-version
      7.1.25
      --substitute-variable
      FDB_POD_IP
      --init-mode
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 21 Jan 2023 08:09:23 +0700
      Finished:     Sat, 21 Jan 2023 08:09:24 +0700
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:     100m
      memory:  128Mi
    Environment:
      FDB_PUBLIC_IP:     (v1:status.podIP)
      FDB_POD_IP:        (v1:status.podIP)
      FDB_MACHINE_ID:    (v1:spec.nodeName)
      FDB_ZONE_ID:       (v1:spec.nodeName)
      FDB_INSTANCE_ID:  cluster_controller-1
    Mounts:
      /var/input-files from config-map (rw)
      /var/output-files from dynamic-conf (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dwz4j (ro)
Containers:
  foundationdb:
    Container ID:  docker://f2fe3ba8ef0fcf849f91a586625512bf98ecd0f1cc1c6a71f145bfed90f6b798
    Image:         foundationdb/foundationdb:7.1.25
    Image ID:      docker-pullable://foundationdb/foundationdb@sha256:e57f2781cc553e46f3aae35c2e37be3cfb27553c9d2eb119523e707f5203636c
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
    Args:
      fdbmonitor --conffile /var/dynamic-conf/fdbmonitor.conf --lockfile /var/dynamic-conf/fdbmonitor.lockfile --loggroup fdb-cluster >> /var/log/fdb-trace-logs/fdbmonitor-$(date '+%Y-%m-%d').log 2>&1
    State:          Running
      Started:      Sat, 21 Jan 2023 08:09:25 +0700
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:     100m
      memory:  128Mi
    Environment:
      FDB_CLUSTER_FILE:  /var/dynamic-conf/fdb.cluster
    Mounts:
      /var/dynamic-conf from dynamic-conf (rw)
      /var/fdb/data from data (rw)
      /var/log/fdb-trace-logs from fdb-trace-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dwz4j (ro)
  foundationdb-kubernetes-sidecar:
    Container ID:  docker://d18386511811d31e8a2ded9452ea5115f0904824035dbbc966e33f160db5a41e
    Image:         foundationdb/foundationdb-kubernetes-sidecar:7.1.25-1
    Image ID:      docker-pullable://foundationdb/foundationdb-kubernetes-sidecar@sha256:4dd174fc0fb3f900818e8a556ee27d218ed4ea810df454bd86d869ec450a3d7d
    Port:          <none>
    Host Port:     <none>
    Args:
      --copy-file
      fdb.cluster
      --input-monitor-conf
      fdbmonitor.conf
      --copy-binary
      fdbserver
      --copy-binary
      fdbcli
      --main-container-version
      7.1.25
      --substitute-variable
      FDB_POD_IP
    State:          Running
      Started:      Sat, 21 Jan 2023 08:09:25 +0700
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:     100m
      memory:  128Mi
    Liveness:  tcp-socket :8080 delay=0s timeout=1s period=30s #success=1 #failure=5
    Environment:
      FDB_PUBLIC_IP:          (v1:status.podIP)
      FDB_POD_IP:             (v1:status.podIP)
      FDB_MACHINE_ID:         (v1:spec.nodeName)
      FDB_ZONE_ID:            (v1:spec.nodeName)
      FDB_INSTANCE_ID:       cluster_controller-1
      FDB_TLS_VERIFY_PEERS:  
    Mounts:
      /var/input-files from config-map (rw)
      /var/output-files from dynamic-conf (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dwz4j (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  dynamic-conf:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config-map:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      fdb-cluster-config
    Optional:  false
  fdb-trace-logs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-dwz4j:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  3m17s  default-scheduler  Successfully assigned default/fdb-cluster-cluster-controller-1 to ip-192-168-92-31.us-east-2.compute.internal
  Normal  Pulled     3m15s  kubelet            Container image "foundationdb/foundationdb-kubernetes-sidecar:7.1.25-1" already present on machine
  Normal  Created    3m15s  kubelet            Created container foundationdb-kubernetes-init
  Normal  Started    3m15s  kubelet            Started container foundationdb-kubernetes-init
  Normal  Pulled     3m13s  kubelet            Container image "foundationdb/foundationdb:7.1.25" already present on machine
  Normal  Created    3m13s  kubelet            Created container foundationdb
  Normal  Started    3m13s  kubelet            Started container foundationdb
  Normal  Pulled     3m13s  kubelet            Container image "foundationdb/foundationdb-kubernetes-sidecar:7.1.25-1" already present on machine
  Normal  Created    3m13s  kubelet            Created container foundationdb-kubernetes-sidecar
  Normal  Started    3m13s  kubelet            Started container foundationdb-kubernetes-sidecar

Log pod

kubectl describe -n default pod fdb-cluster-log-1
Name:           fdb-cluster-log-1
Namespace:      default
Priority:       0
Node:           <none>
Labels:         foundationdb.org/fdb-cluster-name=fdb-cluster
                foundationdb.org/fdb-process-class=log
                foundationdb.org/fdb-process-group-id=log-1
Annotations:    foundationdb.org/last-applied-config-map: d5cbe95e876334c9a47d4ad1f8d3bb629a0e399d72740cea9523b4f7ae4a78de
                foundationdb.org/last-applied-spec: 27b8456807fda54d60206013f021ce03e0c17da2ee164a609d94c80d66af9af4
                foundationdb.org/public-ip-source: pod
                kubernetes.io/psp: eks.privileged
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  FoundationDBCluster/fdb-cluster
Init Containers:
  foundationdb-kubernetes-init:
    Image:      foundationdb/foundationdb-kubernetes-sidecar:7.1.25-1
    Port:       <none>
    Host Port:  <none>
    Args:
      --copy-file
      fdb.cluster
      --input-monitor-conf
      fdbmonitor.conf
      --copy-binary
      fdbserver
      --copy-binary
      fdbcli
      --main-container-version
      7.1.25
      --substitute-variable
      FDB_POD_IP
      --init-mode
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:     100m
      memory:  128Mi
    Environment:
      FDB_PUBLIC_IP:     (v1:status.podIP)
      FDB_POD_IP:        (v1:status.podIP)
      FDB_MACHINE_ID:    (v1:spec.nodeName)
      FDB_ZONE_ID:       (v1:spec.nodeName)
      FDB_INSTANCE_ID:  log-1
    Mounts:
      /var/input-files from config-map (rw)
      /var/output-files from dynamic-conf (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ctzcd (ro)
Containers:
  foundationdb:
    Image:      foundationdb/foundationdb:7.1.25
    Port:       <none>
    Host Port:  <none>
    Command:
      sh
      -c
    Args:
      fdbmonitor --conffile /var/dynamic-conf/fdbmonitor.conf --lockfile /var/dynamic-conf/fdbmonitor.lockfile --loggroup fdb-cluster >> /var/log/fdb-trace-logs/fdbmonitor-$(date '+%Y-%m-%d').log 2>&1
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:     100m
      memory:  128Mi
    Environment:
      FDB_CLUSTER_FILE:  /var/dynamic-conf/fdb.cluster
    Mounts:
      /var/dynamic-conf from dynamic-conf (rw)
      /var/fdb/data from data (rw)
      /var/log/fdb-trace-logs from fdb-trace-logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ctzcd (ro)
  foundationdb-kubernetes-sidecar:
    Image:      foundationdb/foundationdb-kubernetes-sidecar:7.1.25-1
    Port:       <none>
    Host Port:  <none>
    Args:
      --copy-file
      fdb.cluster
      --input-monitor-conf
      fdbmonitor.conf
      --copy-binary
      fdbserver
      --copy-binary
      fdbcli
      --main-container-version
      7.1.25
      --substitute-variable
      FDB_POD_IP
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:     100m
      memory:  128Mi
    Liveness:  tcp-socket :8080 delay=0s timeout=1s period=30s #success=1 #failure=5
    Environment:
      FDB_PUBLIC_IP:          (v1:status.podIP)
      FDB_POD_IP:             (v1:status.podIP)
      FDB_MACHINE_ID:         (v1:spec.nodeName)
      FDB_ZONE_ID:            (v1:spec.nodeName)
      FDB_INSTANCE_ID:       log-1
      FDB_TLS_VERIFY_PEERS:  
    Mounts:
      /var/input-files from config-map (rw)
      /var/output-files from dynamic-conf (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ctzcd (ro)
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  fdb-cluster-log-1-data
    ReadOnly:   false
  dynamic-conf:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config-map:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      fdb-cluster-config
    Optional:  false
  fdb-trace-logs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-ctzcd:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

In FDB repo github, I see Dockerfile for EKS, do I need to build with that Dockerfile to make it work on EKS?
For now, I don’t have progress on EKS

I use these configuration and run on GCP and it work.
So upset on AWS.

I use these configuration and run on GCP and it work.
So upset on AWS.

Can you please verify if the PersistentVolumes are created? I would suspect that the PersistenVolumeClaim is stuck waiting for the underlying EBS volume to be created. The official docs from AWS should be helpful for debugging this: Amazon EKS.

Hi @johscheuer ,
I describe storage class, pvc and see that it were created.
But status of pvc is waiting

kubectl get storageclass
NAME            PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2 (default)   kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   false                  3d23h
kubectl describe storageclasses.storage.k8s.io gp2
Name:            gp2
IsDefaultClass:  Yes
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner:           kubernetes.io/aws-ebs
Parameters:            fsType=ext4,type=gp2
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>
kubectl get pvc
NAME                             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
fdb-cluster-log-1-data           Pending                                      gp2            6m20s
fdb-cluster-log-2-data           Pending                                      gp2            6m20s
fdb-cluster-log-3-data           Pending                                      gp2            6m20s
fdb-cluster-storage-1-data       Pending                                      gp2            6m20s
fdb-cluster-storage-2-data       Pending                                      gp2            6m20s
fdb-cluster-storage-3-data       Pending                                      gp2            6m20s
fdb-cluster-transaction-1-data   Pending                                      gp2            6m20s
fdb-cluster-transaction-2-data   Pending                                      gp2            6m20s
Name:          fdb-cluster-log-1-data
Namespace:     default
StorageClass:  gp2
Status:        Pending
Volume:        
Labels:        foundationdb.org/fdb-cluster-name=fdb-cluster
               foundationdb.org/fdb-process-class=log
               foundationdb.org/fdb-process-group-id=log-1
Annotations:   foundationdb.org/last-applied-spec: 26c46a48b72a4b786be02e7290a9607cb265913105131323fb04e86284cea53d
               volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
               volume.kubernetes.io/selected-node: ip-192-168-95-38.us-east-2.compute.internal
               volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Used By:       fdb-cluster-log-1
Events:
  Type    Reason                Age                  From                         Message
  ----    ------                ----                 ----                         -------
  Normal  WaitForFirstConsumer  7m4s                 persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  ExternalProvisioning  93s (x25 over 7m3s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator

I’m not an EKS expert but the link that I shared should contain all information. Are you running an EKS version with Kubernetes newer than 1.23? If so you have to follow the note from the link:

The in-tree Amazon EBS storage provisioner is deprecated. If you are upgrading your cluster to version 1.23 , then you must first install the Amazon EBS driver before updating your cluster. For more information, see Amazon EBS CSI migration frequently asked questions.

More information are here (same AWS document): Amazon EBS CSI driver - Amazon EKS. It seems like your EKS cluster is not properly configured and the cluster is not able to create the underlying EBS volumes. The AWS docs should have all required information to setup the EKS cluster. with the EBS CSI driver: Amazon EBS CSI migration frequently asked questions - Amazon EKS

Hi @johscheuer
Thank for your support, I will read it carefully.
I think problem come from IAM and policy on AWS

I have some concern about FDB run on multi K8s clusters.
I look in https://github.com/FoundationDB/fdb-kubernetes-operator/tree/main/config/tests and it has 2 directories: multi_dc and multi_kc. As official document, FDB support only 2 regions and multi_dc for each region, so how can I deploy FDB on 3 regions like us-east-2,ap-south-1,eu-west-3.
Another question about read/write mechanism of multi cluster in FDB, can each request/transaction read/write to the closest region? Example, I live Asia, my request will go to south-east-2 region, for another come from US, it will go to us-east-2.

That is what I need.
I have some concern in this post
https://forums.foundationdb.org/t/run-foundationdb-cluster-on-multi-kuberbetes-clusters/3741/12?u=mason

In you case the correct choice is the https://github.com/FoundationDB/fdb-kubernetes-operator/tree/main/config/tests/multi_dc configuration. You can deploy it in a triplet configuration (that’s exactly the example in the multi_dc configuration example). In that case you have a primary dc, a remote dc and a satellite dc. The satellite should be the one that is closet to the other two regions. Under normal operations every write has to go to the primary DC and the satellite. Since you regions are quite far away I’m not sure about the performance of the FDB cluster for writes.

Another question about read/write mechanism of multi cluster in FDB, can each request/transaction read/write to the closest region? Example, I live Asia , my request will go to south-east-2 region, for another come from US , it will go to us-east-2 .

That’s not possible writes will always go to the (active) primary region and has to be committed to the primary DC and the satellite DC. A read in theory can go to every region, with the limitation, that the GRV request will always be made to the primary DC and if a version is not available in the current (remote) region the read will be blocked until the version/mutation is available or a timeout is hit. So basically the setup of multi region in FDB is an active-passive model.

Can you explain more about different between multi_dc and multi_kc?
As I look in example, both of them create 3 kubernetes clusters. I’m not clear enough

Can you explain more about different between multi_dc and multi_kc ?
As I look in example, both of them create 3 kubernetes clusters. I’m not clear enough

multi_dc → creates an FDB cluster in multi region config across multiple Kubernetes clusters.
multi_kc → Spans one FDB cluster across multiple Kubernetes clusters.

@johscheuer

So if I deploy FDB cluster with multi_kc, and each kc is in different region, does it equivalent with multi_dc?

My use case is at first, we deploy FDB in one K8s cluster in US, then couple of months, we span two K8s clusters, each in EU and Asia

So if I deploy FDB cluster with multi_kc, and each kc is in different region, does it equivalent with multi_dc?

Not really the multi-dc setup will result in a multi-region configuration: Multi Region Replication · apple/foundationdb Wiki · GitHub. The multi-kc case is one huge FDB cluster spanned across multiple Kubernetes clusters which means all transactions processes must communicate with each other (for your use case that would mean very high latencies).

My use case is at first, we deploy FDB in one K8s cluster in US, then couple of months, we span two K8s clusters, each in EU and Asia

In the end that depends on the latencies that you get between those regions and if those are good enough for your use case (I don’t know what latencies your application requires).

multi_dc mode has 2 usable_regions (regions in FDB concept), and each region has multi_dc. As my understanding, each DC is kubernetes cluster in real cloud region like us-east-2,eu-west-1,ap-south-2. Does my understanding is right?

For multi_kc, it just create big FDB cluster to handle heavy workload?

multi_dc mode has 2 usable_regions (regions in FDB concept), and each region has multi_dc . As my understanding, each DC is kubernetes cluster in real cloud region like us-east-2,eu-west-1,ap-south-2 . Does my understanding is right?

That depends on your deployment models the operator is pretty flexible here and supports different modes. A “dc” could be in the same Kubernetes cluster but in a different namespace, could be a different Kubernetes cluster in the same region, could be a Kubernetes cluster in a different region etc.

For multi_kc , it just create big FDB cluster to handle heavy workload?

Not necessary heavy workload, but also for fault tolerance to survive a single Kubernetes cluster outage.