FDB Cluster upgrade does not deletes old pods

Hi

I am using fdb-operator to setup a very basic FDB cluster on EKS with EBS as storage. I am running into multiple issues with FDB Cluster setup and upgrade -

  1. The cluster never becomes available and gives me below error when I’m checking the status with kubectl fdb command, this is happening when all the pods are in running state -
kubectl get pod
NAME                                         READY   STATUS    RESTARTS   AGE
astradot-fdb-cluster-controller-1            2/2     Running   0          9m58s
astradot-fdb-log-1                           2/2     Running   0          9m58s
astradot-fdb-log-2                           2/2     Running   0          9m58s
astradot-fdb-log-3                           2/2     Running   0          9m58s
astradot-fdb-log-4                           2/2     Running   0          9m58s
astradot-fdb-storage-1                       2/2     Running   0          9m58s
astradot-fdb-storage-2                       2/2     Running   0          9m58s
astradot-fdb-storage-3                       2/2     Running   0          9m58s
kubectl fdb analyze --all-clusters
Checking cluster: astradot/astradot-fdb
✖ Cluster is not available
✖ Cluster is not fully replicated
✖ Cluster is not reconciled
✖ ProcessGroup: cluster_controller-1 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: log-1 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: log-2 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: log-3 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: log-4 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: storage-1 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: storage-2 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: storage-3 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✔ Pods are all running and available
Checking cluster: astradot/astradot-fdb with auto-fix: false

This is my FDB Cluster manifest -

---
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: astradot-fdb
  namespace: astradot
  annotations:
    argocd.argoproj.io/sync-wave: "1502"
spec:
  version: 7.1.27
  automationOptions:
    replacements:
      enabled: true
  minimumUptimeSecondsForBounce: 60
  processCounts:
    cluster_controller: 1
    stateless: -1
  processes:
    general:
      podTemplate:
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: karpenter.sh/provisioner-name
                        operator: Exists
                      - key: kubernetes.io/arch
                        operator: In
                        values: ["amd64"]
          containers:
            - name: foundationdb
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
            - name: foundationdb-kubernetes-sidecar
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
          initContainers:
            - name: foundationdb-kubernetes-init
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
      volumeClaimTemplate:
        spec:
          # storageClassName: local-storage-disk-rancher
          resources:
            requests:
              storage: 20G
  routing:
    defineDNSLocalityFields: true
  sidecarContainer:
    enableLivenessProbe: true
  1. When I change any parameter in the FoundationDBCluster object then it does not deletes the old running pods. Like below is the scenario when I updated the version of the FDB cluster -
kubectl get pods -w -owide
NAME                                         READY   STATUS    RESTARTS   AGE     IP            NODE                          NOMINATED NODE   READINESS GATES
astradot-fdb-cluster-controller-1            2/2     Running   0          19m     10.0.73.82    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-cluster-controller-2            2/2     Running   0          3m23s   10.0.93.96    ip-10-0-83-61.ec2.internal    <none>           <none>
astradot-fdb-log-1                           2/2     Running   0          19m     10.0.65.81    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-2                           2/2     Running   0          19m     10.0.75.192   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-3                           2/2     Running   0          19m     10.0.78.50    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-4                           2/2     Running   0          19m     10.0.64.93    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-5                           2/2     Running   0          3m24s   10.0.66.148   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-6                           2/2     Running   0          3m23s   10.0.71.155   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-7                           2/2     Running   0          3m23s   10.0.79.13    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-8                           2/2     Running   0          3m23s   10.0.93.100   ip-10-0-83-61.ec2.internal    <none>           <none>
astradot-fdb-storage-1                       2/2     Running   0          19m     10.0.69.122   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-storage-2                       2/2     Running   0          19m     10.0.72.135   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-storage-3                       2/2     Running   0          19m     10.0.77.15    ip-10-0-72-98.ec2.internal    <none>           <none>

As you can see we had 4 pods of astradot-fdb-log and once we upgraded the version it became 8, which ideally should be 4 only but with upgrade fdb version.

Let me know if more details are required, I’ll provide the information.

Thanks

Could you share the operator logs and the operator version? The MissingProcesses is indicating, that the fdbserver processes inside the Pods are not running.