FDB Cluster upgrade does not deletes old pods

mdfaizsiddiqui · July 20, 2023, 11:41pm

Hi

I am using fdb-operator to setup a very basic FDB cluster on EKS with EBS as storage. I am running into multiple issues with FDB Cluster setup and upgrade -

The cluster never becomes available and gives me below error when I’m checking the status with kubectl fdb command, this is happening when all the pods are in running state -

kubectl get pod
NAME                                         READY   STATUS    RESTARTS   AGE
astradot-fdb-cluster-controller-1            2/2     Running   0          9m58s
astradot-fdb-log-1                           2/2     Running   0          9m58s
astradot-fdb-log-2                           2/2     Running   0          9m58s
astradot-fdb-log-3                           2/2     Running   0          9m58s
astradot-fdb-log-4                           2/2     Running   0          9m58s
astradot-fdb-storage-1                       2/2     Running   0          9m58s
astradot-fdb-storage-2                       2/2     Running   0          9m58s
astradot-fdb-storage-3                       2/2     Running   0          9m58s

kubectl fdb analyze --all-clusters
Checking cluster: astradot/astradot-fdb
✖ Cluster is not available
✖ Cluster is not fully replicated
✖ Cluster is not reconciled
✖ ProcessGroup: cluster_controller-1 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: log-1 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: log-2 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: log-3 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: log-4 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: storage-1 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: storage-2 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✖ ProcessGroup: storage-3 has the following condition: MissingProcesses since 2023-07-21 04:23:29 +0530 IST
✔ Pods are all running and available
Checking cluster: astradot/astradot-fdb with auto-fix: false

This is my FDB Cluster manifest -

---
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: astradot-fdb
  namespace: astradot
  annotations:
    argocd.argoproj.io/sync-wave: "1502"
spec:
  version: 7.1.27
  automationOptions:
    replacements:
      enabled: true
  minimumUptimeSecondsForBounce: 60
  processCounts:
    cluster_controller: 1
    stateless: -1
  processes:
    general:
      podTemplate:
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: karpenter.sh/provisioner-name
                        operator: Exists
                      - key: kubernetes.io/arch
                        operator: In
                        values: ["amd64"]
          containers:
            - name: foundationdb
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
            - name: foundationdb-kubernetes-sidecar
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
          initContainers:
            - name: foundationdb-kubernetes-init
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
      volumeClaimTemplate:
        spec:
          # storageClassName: local-storage-disk-rancher
          resources:
            requests:
              storage: 20G
  routing:
    defineDNSLocalityFields: true
  sidecarContainer:
    enableLivenessProbe: true

When I change any parameter in the FoundationDBCluster object then it does not deletes the old running pods. Like below is the scenario when I updated the version of the FDB cluster -

kubectl get pods -w -owide
NAME                                         READY   STATUS    RESTARTS   AGE     IP            NODE                          NOMINATED NODE   READINESS GATES
astradot-fdb-cluster-controller-1            2/2     Running   0          19m     10.0.73.82    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-cluster-controller-2            2/2     Running   0          3m23s   10.0.93.96    ip-10-0-83-61.ec2.internal    <none>           <none>
astradot-fdb-log-1                           2/2     Running   0          19m     10.0.65.81    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-2                           2/2     Running   0          19m     10.0.75.192   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-3                           2/2     Running   0          19m     10.0.78.50    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-4                           2/2     Running   0          19m     10.0.64.93    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-5                           2/2     Running   0          3m24s   10.0.66.148   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-6                           2/2     Running   0          3m23s   10.0.71.155   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-7                           2/2     Running   0          3m23s   10.0.79.13    ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-log-8                           2/2     Running   0          3m23s   10.0.93.100   ip-10-0-83-61.ec2.internal    <none>           <none>
astradot-fdb-storage-1                       2/2     Running   0          19m     10.0.69.122   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-storage-2                       2/2     Running   0          19m     10.0.72.135   ip-10-0-72-98.ec2.internal    <none>           <none>
astradot-fdb-storage-3                       2/2     Running   0          19m     10.0.77.15    ip-10-0-72-98.ec2.internal    <none>           <none>

As you can see we had 4 pods of astradot-fdb-log and once we upgraded the version it became 8, which ideally should be 4 only but with upgrade fdb version.

Let me know if more details are required, I’ll provide the information.

Thanks

johscheuer · July 24, 2023, 9:21am

Could you share the operator logs and the operator version? The MissingProcesses is indicating, that the fdbserver processes inside the Pods are not running.

Topic		Replies	Views
How to upgrade apple operator 0.48 to 1.4.1 without deleting the old crd? Kubernetes Operator operator	41	1183	October 13, 2022
Upgrade from fdb cluster from 6.2 to 6.3 failed on k8s environment Using FoundationDB fdbsummit , operator	5	512	November 8, 2022
FDB kubernetes operator continuously boucing processes Kubernetes Operator	8	1009	May 3, 2020
Data loss during recovery from mass pod deletion during scale down Kubernetes Operator operator	13	710	March 25, 2022
FDB K8s Operator stuck after FDB 7 migration Kubernetes Operator operator	4	301	July 7, 2023

FDB Cluster upgrade does not deletes old pods

Related topics