Operator crashes with `illegal instruction`

Hey, I was playing with FDB operator and found an minor but confusing issue. Can’t report in github, so reporting here:

I was following getting started guide and got stuck on FDB operator crashing with the following confusing error:

{"level":"info","ts":1675163849.8685071,"logger":"fdbclient","msg":"Fetch values from FDB","namespace":"fdb","cluster":"test-cluster","key":"\ufffd\ufffd/status/json"}
SIGILL: illegal instruction
PC=0x7fdabe0a74ff m=4 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xc5 0xf9 0xef 0xc0 0x48 0x8d 0x15 0xb6 0xb 0x8 0x1 0x48 0x8d 0x35 0x8f 0xc0

Instructions translate to the following but it’s probably irrelevant:

0:  c5 f9 ef c0             vpxor  xmm0,xmm0,xmm0
4:  48 8d 15 b6 0b 08 01    lea    rdx,[rip+0x1080bb6]        # 0x1080bc1
b:  48                      rex.W
c:  8d                      .byte 0x8d
d:  35                      .byte 0x35
e:  8f c0                   pop    rax

This was on modern-ish low-power Intel CPU (I’m fairly sure I have sse).

Turns out like many things cgo doesn’t handle running out of memory well, and increasing memory limit even to 280Mi alleviated the problem. I don’t even know how to approach fixing the root cause, but I’d suggest increasing memory limit in sample config (fdb-kubernetes-operator/deployment.yaml at main · FoundationDB/fdb-kubernetes-operator · GitHub) so other poor saps don’t spend time debugging this.

Repro:

deployment.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fdb-kubernetes-operator-controller-manager
  namespace: fdb
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: null
  name: fdb-kubernetes-operator-manager-role
  namespace: fdb
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - events
  - persistentvolumeclaims
  - pods
  - secrets
  - services
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbbackups
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbbackups/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbclusters
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbclusters/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbrestores
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps.foundationdb.org
  resources:
  - foundationdbrestores/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: null
  name: fdb-kubernetes-operator-manager-rolebinding
  namespace: fdb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: fdb-kubernetes-operator-manager-role
subjects:
- kind: ServiceAccount
  name: fdb-kubernetes-operator-controller-manager
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fdb-kubernetes-operator-controller-manager
    control-plane: controller-manager
  name: fdb-kubernetes-operator-controller-manager
  namespace: fdb
  annotations:
    kube-score/ignore: "container-security-context,pod-probes,pod-networkpolicy,container-security-context-readonlyrootfilesystem,container-security-context-user-group-id,container-resources"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fdb-kubernetes-operator-controller-manager
  template:
    metadata:
      labels:
        app: fdb-kubernetes-operator-controller-manager
        control-plane: controller-manager
    spec:
      containers:
      - command:
        - /manager
        env:
        - name: WATCH_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: foundationdb/fdb-kubernetes-operator:v1.12.0
        name: manager
        ports:
        - containerPort: 8080
          name: metrics
        resources:
          requests:
            cpu: 500m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 256Mi
        securityContext:
          allowPrivilegeEscalation: false
          privileged: false
          readOnlyRootFilesystem: true
        volumeMounts:
        - mountPath: /tmp
          name: tmp
        - mountPath: /var/log/fdb
          name: logs
        - mountPath: /usr/bin/fdb
          name: fdb-binaries
      initContainers:
      - args:
        - --copy-library
        - "6.2"
        - --copy-binary
        - fdbcli
        - --copy-binary
        - fdbbackup
        - --copy-binary
        - fdbrestore
        - --output-dir
        - /var/output-files/6.2.30
        - --init-mode
        image: foundationdb/foundationdb-kubernetes-sidecar:6.2.30-1
        name: foundationdb-kubernetes-init-6-2
        volumeMounts:
        - mountPath: /var/output-files
          name: fdb-binaries
      - args:
        - --copy-library
        - "6.3"
        - --copy-binary
        - fdbcli
        - --copy-binary
        - fdbbackup
        - --copy-binary
        - fdbrestore
        - --output-dir
        - /var/output-files/6.3.24
        - --init-mode
        image: foundationdb/foundationdb-kubernetes-sidecar:6.3.24-1
        name: foundationdb-kubernetes-init-6-3
        volumeMounts:
        - mountPath: /var/output-files
          name: fdb-binaries
      - args:
        - --copy-library
        - "7.1"
        - --copy-binary
        - fdbcli
        - --copy-binary
        - fdbbackup
        - --copy-binary
        - fdbrestore
        - --output-dir
        - /var/output-files/7.1.15
        - --init-mode
        image: foundationdb/foundationdb-kubernetes-sidecar:7.1.15-1
        name: foundationdb-kubernetes-init-7-1
        volumeMounts:
        - mountPath: /var/output-files
          name: fdb-binaries
      securityContext:
        fsGroup: 4059
        runAsGroup: 4059
        runAsUser: 4059
      serviceAccountName: fdb-kubernetes-operator-controller-manager
      terminationGracePeriodSeconds: 10
      volumes:
      - emptyDir: {}
        name: tmp
      - emptyDir: {}
        name: logs
      - emptyDir: {}
        name: fdb-binaries

cluster.yaml (taken from https://raw.githubusercontent.com/foundationdb/fdb-kubernetes-operator/main/config/samples/cluster.yaml with some minor modifications)

apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: test-cluster
  namespace: fdb
spec:
  automationOptions:
    replacements:
      enabled: true
    podUpdateStrategy: "Replace"
  faultDomain:
    key: foundationdb.org/none
  labels:
    filterOnOwnerReference: false
    matchLabels:
      foundationdb.org/fdb-cluster-name: test-cluster
    processClassLabels:
    - foundationdb.org/fdb-process-class
    processGroupIDLabels:
    - foundationdb.org/fdb-process-group-id
  minimumUptimeSecondsForBounce: 60
  # processCounts:
  #   cluster_controller: 1
  #   storage: 3
  #   log: 3
  #   #stateless: 4
  #   stateless: -1
  # https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/manual/scaling.md#growing-a-cluster
  databaseConfiguration:
    storage: 3
    logs: 1 # default is 3
    proxies: 1 # default is 3
    resolvers: 1 # default is 1
    redundancy_mode: single
  processes:
    general:
      customParameters:
      - knob_disable_posix_kernel_aio=1
      podTemplate:
        spec:
          containers:
          - name: foundationdb
            resources:
              requests:
                cpu: 100m
                memory: 128Mi
              limits:
                cpu: 100m
                memory: 170Mi
            securityContext:
              runAsUser: 0
          - name: foundationdb-kubernetes-sidecar
            resources:
              limits:
                cpu: 100m
                memory: 128Mi
              requests:
                cpu: 100m
                memory: 128Mi
            securityContext:
              runAsUser: 0
          initContainers:
          - name: foundationdb-kubernetes-init
            resources:
              limits:
                cpu: 100m
                memory: 128Mi
              requests:
                cpu: 100m
                memory: 128Mi
            securityContext:
              runAsUser: 0
      volumeClaimTemplate:
        spec:
          # RBD would have worked better
          storageClassName: rook-cephfs
          resources:
            requests:
              storage: 16G
  routing:
    headlessService: true
  sidecarContainer:
    enableLivenessProbe: true
    enableReadinessProbe: false
  useExplicitListenAddress: true
  version: 6.3.22

(note I also had to up the memory limit for foundationdb container)

Could you check if your cpu supports AVX instructions? If not can you please change the image foundationdb/foundationdb-kubernetes-sidecar:7.1.15-1 to foundationdb/foundationdb-kubernetes-sidecar:7.1.14-1 (the even numbers are build without avx support), that should solve your problem. We probably should write that somewhere more visible.

Ah, indeed. I did check SSE but not AVX.

The host it was originally running at does not support AVX. After I’ve bumped memory limit the pod moved to a different host which does support AVX (and I didn’t notice).