Openshift+Azure+Operator performance tweaks

Hi all,

I need some help tweaking my Multi-dc cluster to achieve better performance while using the FDB Operator on OpenShift running in Azure. I’m currently running the YCSB performance tool to determine read/write I/O performance, and comparing to some tests that were run in a single cluster setup. I’m seeing around 50% of the performance compared the single cluster, even while doubling the number of machines in use. I imagine there is something about FDB I don’t understand, or at least a way to gather more information than what I have. Any assistance would help me greatly!

Mulit-Cluster setup

Machines: 50 count of Standard-D8s_v3

NAME                                                  PHASE     TYPE              REGION    ZONE   AGE
fdb04-dev-va7-zrbmx-master-0                          Running   Standard_D8s_v3   eastus2   1      12d
fdb04-dev-va7-zrbmx-master-1                          Running   Standard_D8s_v3   eastus2   2      12d
fdb04-dev-va7-zrbmx-master-2                          Running   Standard_D8s_v3   eastus2   3      12d
fdb04-dev-va7-zrbmx-worker-eastus21-2sxzp             Running   Standard_L8s_v3   eastus2   1      22m
fdb04-dev-va7-zrbmx-worker-eastus21-667hw             Running   Standard_L8s_v3   eastus2   1      22m
fdb04-dev-va7-zrbmx-worker-eastus21-68cc6             Running   Standard_L8s_v3   eastus2   1      22m
fdb04-dev-va7-zrbmx-worker-eastus21-6jjm7             Running   Standard_L8s_v3   eastus2   1      22m
...

With relevant yaml (I have removed anything obviously non-relevant):

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  annotations:
    machine.openshift.io/GPU: "0"
    machine.openshift.io/memoryMb: "65536"
    machine.openshift.io/vCPU: "8"
  replicas: 19
  template:
    spec:
      providerSpec:
        value:
          acceleratedNetworking: true
          apiVersion: machine.openshift.io/v1beta1
          image:
            offer: aro4
            publisher: azureopenshift
          kind: AzureMachineProviderSpec
          location: eastus2
          osDisk:
            diskSettings: {}
            diskSizeGB: 128
            managedDisk:
              storageAccountType: Premium_LRS
            osType: Linux

We have this exact setup replicated twice, because we were also DR testing and needed separate machinesets to bring up/down. The total machines on this cluster is 50, 3 masters and 47 workers.

relevant FDB setup, we have 3 DC’s in FDB setup, DC1 and DC3 are primary, whereas DC2 is only for replication. This is the FDB config for DC1 and DC3 (With namespaces changed as appropriate):

apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: fdb-cluster-1
  namespace: dc1
spec:
  # coordinatorSelection:
  # - priority: 0
  #   processClass: coordinator
  # - priority: 5
  #   processClass: storage 
  # - priority: 10
  #   processClass: log
  storageServersPerPod: 2
  logServersPerPod: 2
  automationOptions:
    replacements:
      enabled: true
  processGroupIDPrefix: "dc1"
  faultDomain:
    key: kubernetes.io/hostname
    valueFrom: spec.nodeName
  dataCenter: "dc1"
  seedConnectionString: $connectionString
  databaseConfiguration:
    redundancy_mode: "triple"
    usable_regions: 2
    regions:
      - datacenters:
          - id: dc1
            priority: 1
          - id: dc2
            satellite: 1
            priority: 2
          - id: dc3
            satellite: 1
            priority: 1
        satellite_logs: 3
      - datacenters:
          - id: dc3
            priority: 0
          - id: dc2
            satellite: 1
            priority: 2
          - id: dc1
            satellite: 1
            priority: 1
        satellite_logs: 3
  labels:
    filterOnOwnerReference: false
    matchLabels:
      foundationdb.org/fdb-cluster-name: fdb-cluster-1
    processClassLabels:
      - foundationdb.org/fdb-process-class
    processGroupIDLabels:
      - foundationdb.org/fdb-process-group-id
  minimumUptimeSecondsForBounce: 60
  # The storage process counts here are directly comparable to our single setup, which is comparable to a test setup we had years ago. I'd be willing to change log/stateless/coordinator, but so long as there isn't anything wrong with 17 storage servers, we'd like to keep this the same.
  processCounts:
    storage: 17
    log: 2
    stateless: 3
    # coordinator: 3
  processes:
    log:
      volumeClaimTemplate:
        spec:
          storageClassName: fdb-nvme 
          resources:
            requests:
              storage: 64G
      customParameters:
        - knob_disable_posix_kernel_aio=1
      podTemplate:
        spec:
          nodeSelector:
            topology.kubernetes.io/zone: eastus2-1
            beta.kubernetes.io/instance-type: Standard_L8s_v3
          serviceAccountName: fdb-dc1 
          securityContext:
            allowPrivilegeEscalation: false
          hostNetwork: true
          topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                foundationdb.org/fdb-cluster-name: fdb-cluster-1
          containers:
            - name: foundationdb
              resources:
                requests:
                  cpu: 6500m # I found over 6500/8000 and we had trouble scheduling
                  memory: 31Gi
              securityContext:
                runAsUser: 0
            - name: foundationdb-kubernetes-sidecar
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
              securityContext:
                runAsUser: 0
          initContainers:
            - name: foundationdb-kubernetes-init
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
              securityContext:
                runAsUser: 0
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: fdb-nvme 
          resources:
            requests:
              storage: 64G
      customParameters:
        - knob_disable_posix_kernel_aio=1
      podTemplate:
        spec:
          nodeSelector:
            beta.kubernetes.io/instance-type: Standard_L8s_v3
          serviceAccountName: fdb-dc1 
          securityContext:
            allowPrivilegeEscalation: false
          hostNetwork: true
          topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                foundationdb.org/fdb-cluster-name: fdb-cluster-1
          containers:
            - name: foundationdb
              resources:
                requests:
                  cpu: 6500m
                  memory: 31Gi
              securityContext:
                runAsUser: 0
            - name: foundationdb-kubernetes-sidecar
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
              securityContext:
                runAsUser: 0
          initContainers:
            - name: foundationdb-kubernetes-init
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
              securityContext:
                runAsUser: 0
    general:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: 16G
      customParameters:
        - knob_disable_posix_kernel_aio=1
      podTemplate:
        spec:
          serviceAccountName: fdb-dc1 
          securityContext:
            allowPrivilegeEscalation: false
          nodeSelector:
            topology.kubernetes.io/zone: eastus2-1 
            beta.kubernetes.io/instance-type: Standard_D4s_v3
          hostNetwork: true
          topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                foundationdb.org/fdb-cluster-name: fdb-cluster-1
          containers:
            - name: foundationdb
              resources:
                requests:
                  cpu: 3000m
                  memory: 10Gi
              securityContext:
                runAsUser: 0
            - name: foundationdb-kubernetes-sidecar
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
              securityContext:
                runAsUser: 0
          initContainers:
            - name: foundationdb-kubernetes-init
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
              securityContext:
                runAsUser: 0
  routing:
    headlessService: true
    publicIPSource: pod 
  sidecarContainer:
    enableLivenessProbe: true 
    enableReadinessProbe: true  
  useExplicitListenAddress: true
  version: 7.1.26

And DC2:

apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: fdb-cluster-2
  namespace: dc2
spec:
  # coordinatorSelection:
  # - priority: 0
  #   processClass: coordinator
  # - priority: 5
  #   processClass: storage 
  # - priority: 10
  #   processClass: log  
  storageServersPerPod: 2
  logServersPerPod: 2
  automationOptions:
    replacements:
      enabled: true
  faultDomain:
    key: kubernetes.io/hostname
    valueFrom: spec.nodeName
  processGroupIDPrefix: "dc2"
  dataCenter: "dc2"
  seedConnectionString: $connectionString
  databaseConfiguration:
    redundancy_mode: "double"
    usable_regions: 2
    regions:
      - datacenters:
          - id: dc1
            priority: 1
          - id: dc2
            satellite: 1
            priority: 2
          - id: dc3
            satellite: 1
            priority: 1
        satellite_logs: 3
      - datacenters:
          - id: dc3
            priority: 0
          - id: dc2
            satellite: 1
            priority: 2
          - id: dc1
            satellite: 1
            priority: 1
        satellite_logs: 3
  labels:
    filterOnOwnerReference: false
    matchLabels:
      foundationdb.org/fdb-cluster-name: fdb-cluster-2
    processClassLabels:
      - foundationdb.org/fdb-process-class
    processGroupIDLabels:
      - foundationdb.org/fdb-process-group-id
  minimumUptimeSecondsForBounce: 60
  processCounts:
    storage: -1
    log: 3
    stateless: -1
    # coordinator: 3
  processes:
    log:
      volumeClaimTemplate:
        spec:
          storageClassName: fdb-nvme 
          resources:
            requests:
              storage: 64G
      customParameters:
        - knob_disable_posix_kernel_aio=1
      podTemplate:
        spec:
          nodeSelector:
            topology.kubernetes.io/zone: eastus2-2 
            beta.kubernetes.io/instance-type: Standard_L8s_v3
          serviceAccountName: fdb-dc2 
          securityContext:
            allowPrivilegeEscalation: false
          hostNetwork: true
          topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                foundationdb.org/fdb-cluster-name: fdb-cluster-2
          containers:
            - name: foundationdb
              resources:
                requests:
                  cpu: 6500m
                  memory: 31Gi
              securityContext:
                runAsUser: 0
            - name: foundationdb-kubernetes-sidecar
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
              securityContext:
                runAsUser: 0
          initContainers:
            - name: foundationdb-kubernetes-init
              resources:
                limits:
                  cpu: 100m
                  memory: 128Mi
                requests:
                  cpu: 100m
                  memory: 128Mi
              securityContext:
                runAsUser: 0
    # storage:
    #   volumeClaimTemplate:
    #     spec:
    #       storageClassName: fdb-nvme 
    #       resources:
    #         requests:
    #           storage: 16G
    #   customParameters:
    #     - knob_disable_posix_kernel_aio=1
    #   podTemplate:
    #     spec:
    #       nodeSelector:
    #         topology.kubernetes.io/zone: eastus2-2 
    #         beta.kubernetes.io/instance-type: Standard_L8s_v3
    #       serviceAccountName: fdb-dc2
    #       securityContext:
    #         allowPrivilegeEscalation: false
    #       hostNetwork: true
    #       topologySpreadConstraints:
    #       - maxSkew: 1
    #         topologyKey: kubernetes.io/hostname
    #         whenUnsatisfiable: DoNotSchedule
    #         labelSelector:
    #           matchLabels:
    #             foundationdb.org/fdb-cluster-name: fdb-cluster-2
    #       containers:
    #         - name: foundationdb
    #           resources:
    #             requests:
    #               cpu: 100m
    #               memory: 128Mi
    #           securityContext:
    #             runAsUser: 0
    #         - name: foundationdb-kubernetes-sidecar
    #           resources:
    #             limits:
    #               cpu: 100m
    #               memory: 128Mi
    #             requests:
    #               cpu: 100m
    #               memory: 128Mi
    #           securityContext:
    #             runAsUser: 0
    #       initContainers:
    #         - name: foundationdb-kubernetes-init
    #           resources:
    #             limits:
    #               cpu: 100m
    #               memory: 128Mi
    #             requests:
    #               cpu: 100m
    #               memory: 128Mi
    #           securityContext:
    #             runAsUser: 0
    # general:
    #   volumeClaimTemplate:
    #     spec:
    #       resources:
    #         requests:
    #           storage: 16G
    #   customParameters:
    #     - knob_disable_posix_kernel_aio=1
    #   podTemplate:
    #     spec:
    #       serviceAccountName: fdb-dc2
    #       securityContext:
    #         allowPrivilegeEscalation: false
    #       nodeSelector:
    #         topology.kubernetes.io/zone: eastus2-2
    #         beta.kubernetes.io/instance-type: Standard_D4s_v3
    #       hostNetwork: true
    #       topologySpreadConstraints:
    #       - maxSkew: 1
    #         topologyKey: kubernetes.io/hostname
    #         whenUnsatisfiable: DoNotSchedule
    #         labelSelector:
    #           matchLabels:
    #             foundationdb.org/fdb-cluster-name: fdb-cluster-2
    #       containers:
    #         - name: foundationdb
    #           resources:
    #             requests:
    #               cpu: 100m
    #               memory: 128Mi
    #           securityContext:
    #             runAsUser: 0
    #         - name: foundationdb-kubernetes-sidecar
    #           resources:
    #             limits:
    #               cpu: 100m
    #               memory: 128Mi
    #             requests:
    #               cpu: 100m
    #               memory: 128Mi
    #           securityContext:
    #             runAsUser: 0
    #       initContainers:
    #         - name: foundationdb-kubernetes-init
    #           resources:
    #             limits:
    #               cpu: 100m
    #               memory: 128Mi
    #             requests:
    #               cpu: 100m
    #               memory: 128Mi
    #           securityContext:
    #             runAsUser: 0
  routing:
    headlessService: true
    publicIPSource: pod 
  sidecarContainer:
    enableLivenessProbe: true 
    enableReadinessProbe: true  
  useExplicitListenAddress: true
  version: 7.1.26

As you can see, DC2 is just logs, and that is where we run YCSB.

The single node FDB cluster is as follows:

apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: fdb-cluster
  namespace: default
spec:
  storageServersPerPod: 8
  automationOptions:
    replacements:
      enabled: true
  faultDomain:
    key: foundationdb.org/none
  labels:
    podlabel: fdblabel
    filterOnOwnerReference: false
    matchLabels:
      foundationdb.org/fdb-cluster-name: fdb-cluster
    processClassLabels:
    - foundationdb.org/fdb-process-class
    processGroupIDLabels:
    - foundationdb.org/fdb-process-group-id
  minimumUptimeSecondsForBounce: 60
  processCounts:
    cluster_controller: 1
    storage: 17
    log: 2
    stateless: 4
  processes:
    general:
      podTemplate:
        spec:
          hostNetwork: true
          topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                foundationdb.org/fdb-cluster-name: fdb-cluster
          containers:
          - name: foundationdb
            resources:
              requests:
                cpu: 7000m
                memory: 31Gi
            securityContext:
              runAsUser: 0
          - name: foundationdb-kubernetes-sidecar
            resources:
              limits:
                cpu: 100m
                memory: 128Mi
              requests:
                cpu: 100m
                memory: 128Mi
            securityContext:
              runAsUser: 0
          initContainers:
          - name: foundationdb-kubernetes-init
            resources:
              limits:
                cpu: 100m
                memory: 128Mi
              requests:
                cpu: 100m
                memory: 128Mi
            securityContext:
              runAsUser: 0
      customParameters:
      - knob_disable_posix_kernel_aio=1
      volumeClaimTemplate:
        spec:
          storageClassName: fdb-nvme
          resources:
            requests:
              storage: 64G
  routing:
    headlessService: true
    publicIPSource: pod
  sidecarContainer:
    enableLivenessProbe: true
    enableReadinessProbe: true
  databaseConfiguration:
    redundancy_mode: triple
  useExplicitListenAddress: true
  version: 7.1.26

My fdbcli output is mostly irrelevant right now, as we have some failures on it from DR testing, but some relevant sections are as follows:

Configuration:
  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 9
  Exclusions             - 4 (type `exclude' for details)
  Desired Commit Proxies - 2
  Desired GRV Proxies    - 1
  Desired Resolvers      - 1
  Desired Logs           - 3
  Desired Remote Logs    - 3
  Desired Log Routers    - 3
  Usable Regions         - 2
  Regions: 
    Remote -
        Datacenter                    - dc1
        Satellite datacenters         - dc2, dc3
        Satellite Logs                - 3
    Primary -
        Datacenter                    - dc3
        Satellite datacenters         - dc2, dc1
        Satellite Logs                - 3

...

Disk space used        - 4.288 TB

...

<no logs are currently relevant>
<Coordination servers are all reachable>

We’re running YCSB as a container on the same cluster, in DC2. I have a quick templating program I wrote to insert these values into the yaml for fast reconfiguration and automated testing.

YCSB values:

"ycsb-statefulset": {
      "replicas": 20,
      "num_keys": 100000000,
      "process_per_host": 8,
      "threads_per_process": 32,
      "batch_size": 100,
      "update_proportion": 0.0,
      "read_proportion": 1.0,
      "max_execution_time_seconds": 200000
    },

We tweak replicas, process_per_host and threads_per_process mostly, we’re keeping batch_size the same, and currently doing read-only tests, but nothing major changes on a 90/10 ratio. num_keys of 100 million was enough that we didn’t see super spiky loads (So much faster results than we expect) but doesn’t take too long to test (about 30 minutes)

YCSB script that uses those values:

host_index=$(echo $HOSTNAME | grep -o '[0-9]\+')
          cd ycsb-foundationdb-binding-0.18.0-SNAPSHOT
          now=$(date +"%Y_%m_%d_%H_%M_%S")
          log_file="ycsb_host_${host_index}__$now.log"
          workload='workloads/workloada'
          update_proportion=0.1
          read_proportion=0.9
          num_keys=100000000
          value_size_bytes=2000 
          batch_size=100
          operation_count=$((num_keys / batch_size))
          num_clients=30
          field_count=10
          field_length=100
          threads_per_process=32
          max_execution_time_seconds=200000
          keys_per_host=$((num_keys / num_clients))
          process_per_host=16
          keys_per_process=$((keys_per_host / process_per_host))
          start_key_offset=0
          start_key_per_host=$((host_index*keys_per_host+start_key_offset))
          while true; do
            echo "Waiting for cluster file to be created"
            if [ -f $FDB_CLUSTER_FILE ]; then
              break
            fi
            sleep 5
          done
          ycsb_load_cmd="./bin/ycsb load foundationdb -s \
            -P $workload \
            -p foundationdb.apiversion=620 \
            -p foundationdb.clusterfile=$FDB_CLUSTER_FILE \
            -p recordcount=$num_keys \
            -p insertstart=$start_key_per_host \
            -p insertcount=$keys_per_host \
            -p operationcount=$operation_count \
            -p maxexecutiontime=$max_execution_time_seconds \
            -p fieldcount=$field_count \
            -p fieldlength=$field_length \
            -p readproportion=$read_proportion \
            -p updateproportion=$update_proportion \
            -p requestdistribution=uniform \
            -p threadcount=$threads_per_process"
          ycsb_run_cmd="./bin/ycsb run foundationdb -s \
            -P $workload \
            -p foundationdb.apiversion=620 \
            -p foundationdb.clusterfile=$FDB_CLUSTER_FILE \
            -p recordcount=$num_keys \
            -p insertstart=$start_key_per_host \
            -p insertcount=$keys_per_host \
            -p operationcount=$operation_count \
            -p maxexecutiontime=$max_execution_time_seconds \
            -p fieldcount=$field_count \
            -p fieldlength=$field_length \
            -p readproportion=$read_proportion \
            -p updateproportion=$update_proportion \
            -p requestdistribution=uniform \
            -p threadcount=$threads_per_process"
          sh -c "$ycsb_load_cmd"
          sh -c "$ycsb_run_cmd"
          while true; do
            echo "finished... host $HOSTNAME"
            sleep 20
          done

I know the YCSB connection to FDB hasn’t been updated in awhile, it’s using an old version of the api (620) where the cluster is up to date (7.1.26), but it’s the best we can do to get the closest results comparable to results we did years ago.

I can provide any more information if needed. Any help/ideas are much appreciated.

Thanks in advance

Doing a commit will required to push a message to all log/transaction processes (see FDB Read and Write Path — FoundationDB 7.1 for more details). That means for every commit you hit cross AZ latencies: What are Azure availability zones? | Microsoft Learn and the slowest response will be the limit on how fast you can commit. I’m not sure about the inter-AZ latencies, but those tend to be much smaller.