`fdbbackup` process not running after created by FoundationDBBackup

Hi all,
I have created fdb cluster and backup with FDB operator.
After successful created, my FoundationDBBackup just create backup_agent pod and run backup_agent command, not start fdbbackup process as official document said
backup document

Creating this resource will tell the operator to do the following things:

1. Create a sample-cluster-backup-agents deployment running FoundationDB backup agent processes connecting to the cluster.
2. Run an fdbbackup start command to start a backup at https://object-store.example:443/sample-cluster using the bucket name fdb-backups.

Below are my configuration for FDB cluster and FDBBackup

# This file provides an example of a cluster you can run in a local testing
# environment.
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: fdb-cluster
spec:
  version: 6.3.23
  faultDomain:
    key: foundationdb.org/none
  processCounts:
    stateless: -1
    cluster_controller: 1
    storage: 4
    log: 4
  routing:
    publicIPSource: "service"
    headlessService: true
  mainContainer:
    enableTls: true
  sidecarContainer:
    enableTls: true
  processes:
    general:
      customParameters:
        - "knob_disable_posix_kernel_aio=1"
        - "locality_test=1"
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: "16G"
      podTemplate:
        spec:
          containers:
            - name: foundationdb
              env:
              - name: FDB_TLS_CERTIFICATE_FILE
                value: /var/fdb-certs/tls.crt
              - name: FDB_TLS_KEY_FILE
                value: /var/fdb-certs/tls.key
              - name: FDB_TLS_CA_FILE
                value: /var/trust-ca/ca.crt
              volumeMounts:
                - name: fdb-certs
                  mountPath: /var/fdb-certs
                - name: trust-ca
                  mountPath: /var/trust-ca
              resources:
                requests:
                  cpu: 250m
                  memory: 128Mi
            - name: foundationdb-kubernetes-sidecar
              env:
              - name: FDB_TLS_CERTIFICATE_FILE
                value: /var/fdb-certs/tls.crt
              - name: FDB_TLS_KEY_FILE
                value: /var/fdb-certs/tls.key
              - name: FDB_TLS_CA_FILE
                value: /var/trust-ca/ca.crt
              volumeMounts:
                - name: fdb-certs
                  mountPath: /var/fdb-certs
                - name: trust-ca
                  mountPath: /var/trust-ca
              resources:
                requests:
                  cpu: 100m
                  memory: 128Mi
                limits:
                  cpu: 100m
                  memory: 128Mi
          initContainers:
            - name: foundationdb-kubernetes-init
              resources:
                requests:
                  cpu: 100m
                  memory: 128Mi
                limits:
                  cpu: 100m
                  memory: 128Mi
          volumes:
            - name: fdb-certs
              secret:
                secretName: fdb-cluster-certificate
            - name: trust-ca
              secret:
                secretName: root-secret
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBBackup
metadata:
  name: fdb-cluster
spec:
  blobStoreConfiguration:
    accountName: <access_key_id>@s3.us-east-2.amazonaws.com
    backupName: daily-backup
    bucket: fdb-backup
  clusterName: fdb-cluster
  customParameters:
    - "knob_http_request_aws_v4_header=true"
    - "knob_http_verbose_level=4"
  backupState: "Running"
  podTemplateSpec:
    spec:
      containers:
        - env:
          - name: FDB_BLOB_CREDENTIALS
            value: /var/backup-credentials/credentials
          - name: FDB_TLS_CERTIFICATE_FILE
            value: /var/fdb-certs/tls.crt
          - name: FDB_TLS_CA_FILE
            value: /var/root-ca/ca.crt
          - name: FDB_TLS_KEY_FILE
            value: /var/fdb-certs/tls.key
          name: foundationdb
          resources:
            limits:
              cpu: 250m
              memory: 128Mi
            requests:
              cpu: 250m
              memory: 128Mi
          securityContext:
            runAsGroup: 0
            runAsUser: 0
          volumeMounts:
            - mountPath: /var/fdb-certs
              name: fdb-certs
            - mountPath: /var/root-ca
              name: root-ca
            - mountPath: /var/backup-credentials
              name: backup-credentials
      initContainers:
        - name: foundationdb-kubernetes-init
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 100m
              memory: 128Mi
          securityContext:
            runAsUser: 0
        - name: root-certificate-init
          image: busybox:1.34
          command: ["/bin/sh", "-c", "cat /var/trust-ca/ca.crt /var/aws-root-certificate/aws-root-ca.crt >> /tmp/root-ca/ca.crt"]
          volumeMounts:
            - mountPath: /var/trust-ca
              name: trust-ca
            - mountPath: /var/aws-root-certificate
              name: aws-root-certificate
            - mountPath: /tmp/root-ca
              name: root-ca
      volumes:
        - name: backup-credentials
          secret:
            secretName: aws-credentials
        - name: fdb-certs
          secret:
            secretName: fdb-cluster-certificate
        - name: trust-ca
          secret:
            secretName: root-secret
        - name: aws-root-certificate
          secret:
            secretName: aws-root-ca
        - name: root-ca
          emptyDir: {}
  snapshotPeriodSeconds: 3600
  version: 6.3.23

Here is my backup_agent pod describe:

Name:         fdb-cluster-backup-agents-6b67c7b8d4-gwhtv
Namespace:    default
Priority:     0
Node:         k3d-odiolog-cluster-server-0/172.21.0.2
Start Time:   Thu, 26 May 2022 13:18:10 +0700
Labels:       foundationdb.org/deployment-name=fdb-cluster-backup-agents
              pod-template-hash=6b67c7b8d4
Annotations:  <none>
Status:       Running
IP:           10.42.0.34
IPs:
  IP:           10.42.0.34
Controlled By:  ReplicaSet/fdb-cluster-backup-agents-6b67c7b8d4
Init Containers:
  foundationdb-kubernetes-init:
    Container ID:  containerd://12c7a846b9a60382e4b1cc715a568d1e992addea97081a075a3c092a78414ef4
    Image:         foundationdb/foundationdb-kubernetes-sidecar:6.3.23-1
    Image ID:      docker.io/foundationdb/foundationdb-kubernetes-sidecar@sha256:07e0ae17a7ca22cecfde69d03285cecbd483f1d1d3235044a1a779bf99596ac2
    Port:          <none>
    Host Port:     <none>
    Args:
      --copy-file
      fdb.cluster
      --require-not-empty
      fdb.cluster
      --init-mode
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 27 May 2022 09:53:19 +0700
      Finished:     Fri, 27 May 2022 09:53:19 +0700
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:        100m
      memory:     128Mi
    Environment:  <none>
    Mounts:
      /var/input-files from config-map (rw)
      /var/output-files from dynamic-conf (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8prpf (ro)
  root-certificate-init:
    Container ID:  containerd://f8b142701a2e8ebb416e34285786c118a87c8674e9a714f6a9fb48748105d493
    Image:         busybox:1.34
    Image ID:      docker.io/library/busybox@sha256:d2b53584f580310186df7a2055ce3ff83cc0df6caacf1e3489bff8cf5d0af5d8
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      cat /var/trust-ca/ca.crt /var/aws-root-certificate/aws-root-ca.crt >> /tmp/root-ca/ca.crt
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 27 May 2022 09:53:19 +0700
      Finished:     Fri, 27 May 2022 09:53:19 +0700
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /tmp/root-ca from root-ca (rw)
      /var/aws-root-certificate from aws-root-certificate (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8prpf (ro)
      /var/trust-ca from trust-ca (rw)
Containers:
  foundationdb:
    Container ID:  containerd://dd6c82b2b14dfbaba015ef6c235c6c5b35e57793a535f1ac33d562b9a7636396
    Image:         foundationdb/foundationdb:6.3.23
    Image ID:      docker.io/foundationdb/foundationdb@sha256:3328af0de2012fc9a07aea158240ad29db0f68afd4e5fffab6508efc867a76e0
    Port:          <none>
    Host Port:     <none>
    Command:
      backup_agent
    Args:
      --log
      --logdir
      /var/log/fdb-trace-logs
      --knob_http_request_aws_v4_header=true
      --knob_http_verbose_level=4
    State:          Running
      Started:      Fri, 27 May 2022 09:53:20 +0700
    Last State:     Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Thu, 26 May 2022 13:18:12 +0700
      Finished:     Fri, 27 May 2022 09:53:08 +0700
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     250m
      memory:  128Mi
    Requests:
      cpu:     250m
      memory:  128Mi
    Environment:
      FDB_BLOB_CREDENTIALS:      /var/backup-credentials/credentials
      FDB_TLS_CERTIFICATE_FILE:  /var/fdb-certs/tls.crt
      FDB_TLS_CA_FILE:           /var/root-ca/ca.crt
      FDB_TLS_KEY_FILE:          /var/fdb-certs/tls.key
      FDB_CLUSTER_FILE:          /var/dynamic-conf/fdb.cluster
    Mounts:
      /var/backup-credentials from backup-credentials (rw)
      /var/dynamic-conf from dynamic-conf (rw)
      /var/fdb-certs from fdb-certs (rw)
      /var/log/fdb-trace-logs from logs (rw)
      /var/root-ca from root-ca (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8prpf (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  backup-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  aws-credentials
    Optional:    false
  fdb-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  fdb-cluster-certificate
    Optional:    false
  trust-ca:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  root-secret
    Optional:    false
  aws-root-certificate:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  aws-root-ca
    Optional:    false
  root-ca:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  logs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  dynamic-conf:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config-map:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      fdb-cluster-config
    Optional:  false
  kube-api-access-8prpf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

I run fdbbackup manual by access to backup_agent pod and it runs successfully

fdbbackup start -d "blobstore://<access_key_id>@s3.us-east-2.amazonaws.com/daily-backup?bucket=fdb-backup" --knob_http_verbose_level=4 --knob_http_request_aws_v4_header=true

but I want my fdbbackup process run automatically by operator as document. I don’t know what configuration I’m missing or log of fdbbackup process (now I don’t have log for this).

Any help or advice will really appreciate.

Update:
When I check log from controller-manager pod, I got this

{"level":"info","ts":1653747946.5015655,"logger":"fdbclient","msg":"Running command","namespace":"default","cluster":"fdb-cluster","path":"/usr/bin/fdb/6.3/fdbbackup","args":["/usr/bin/fdb/6.3/fdbbackup","status","--json","-C","/tmp/1408361116","--log","--logdir","/var/log/fdb"]}
{"level":"info","ts":1653747946.5921144,"logger":"fdbclient","msg":"Command completed","namespace":"default","cluster":"fdb-cluster","output":"{\"SchemaVersion\":\"1...."}
{"level":"info","ts":1653747946.6064582,"msg":"apps.foundationdb.org/v1beta1 FoundationDBBackup is deprecated; use apps.foundationdb.org/v1beta2 FoundationDBBackup\n"}
{"level":"info","ts":1653747946.6200259,"logger":"fdbclient","msg":"Running command","namespace":"default","cluster":"fdb-cluster","path":"/usr/bin/fdb/6.3/fdbbackup","args":["/usr/bin/fdb/6.3/fdbbackup","start","-d","blobstore:///fdb-cluster?bucket=fdb-backups","-s","60","-z","-C","/tmp/2352618906","--log","--logdir","/var/log/fdb"]}
{"level":"error","ts":1653747946.7001693,"logger":"fdbclient","msg":"Error from FDB command","namespace":"default","cluster":"fdb-cluster","code":10,"stdout":"ERROR: 'Backup Container URL invalid' on URL 'blobstore:///fdb-cluster?bucket=fdb-backups': host cannot be empty\nTry `/usr/bin/fdb/6.3/fdbbackup --help' for more information.\n","stderr":"","error":"exit status 10","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/log.(*DelegatingLogger).Error\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/log/deleg.go:144\ngithub.com/FoundationDB/fdb-kubernetes-operator/fdbclient.(*cliAdminClient).runCommand\n\t/workspace/fdbclient/admin_client.go:187\ngithub.com/FoundationDB/fdb-kubernetes-operator/fdbclient.(*cliAdminClient).StartBackup\n\t/workspace/fdbclient/admin_client.go:449\ngithub.com/FoundationDB/fdb-kubernetes-operator/controllers.startBackup.reconcile\n\t/workspace/controllers/start_backup.go:45\ngithub.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBBackupReconciler).Reconcile\n\t/workspace/controllers/backup_controller.go:87\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:99"}

Can anyone explain me why operator cannot get host, bakupName and bucket from blobStoreConfiguration in FDBBackup file?

What version of the operator are you running in your cluster?

"apps.foundationdb.org/v1beta1 FoundationDBBackup is deprecated; use apps.foundationdb.org/v1beta2 FoundationDBBackup\n" indicates that you’re using a version that is older than 1.0.0. If the operator is to old it doesn’t understand the new config and will use defaults instead.

Hi @johscheuer , I have upgraded my operator and it is ok now. But I got another error

<Event Severity="20" Time="1653889584.368559" DateTime="2022-05-30T05:46:24Z" Type="BlobStoreEndpointRequestFailedRetryable" ID="0000000000000000" Error="lookup_failed" ErrorDescription="DNS lookup failed" ErrorCode="1041" SuppressedEventCount="0" ConnectionEstablished="0" RemoteHost="s3.us-east-2.amazonaws.com" Verb="HEAD" Resource="/ci-cd-code-deploy" ThisTry="1" RetryDelay="2" ThreadID="927849639779502412" Machine="10.42.2.19:97" LogGroup="default" />

Hi @johscheuer ,
Even I disable all TLS inside cluster, operator pod cannot lookup for s3.us-east-2.amazonaws.com domain. I tried to telnet/nslookup and it cannot resolve. Do I need more configuration for operator deployment?

You might need to add :443 to the end of the account name. I’ve seen issues in the past where DNS lookups inside the container don’t work without an explicit port, but I can’t remember exactly why.

Hi @john_brownlee , thank you so much.
It works for me. It took me almost one with this error.
Thanks again :slightly_smiling_face: