CPU limited storage processes

v-pap · May 29, 2020, 11:53am

I am running a cluster with 10 storage processes and 4 logs (one log is remaining idle) each running on its own machine. Each machine in the cluster has 4 cores, 12GB of RAM and one local nvme SSD. This setup can’t handle more than 90K writes per second. All the storage processes in the cluster seem to be CPU limited since the CPU load is close to 100% (in the core that fdbserver is assigned to) especially when increasing the write rate. Meanwhile the Disk IO is relatively low never going further than 30%. The log processes seems to be fine with around 10% CPU load and 70-80% Disk IO. After getting feedback from several discussions here (Scalable time series database / Improving write throughput) I have tried to optimize my writes by eliminating the need for reads in the same transaction (by using atomic ops) and at the same time, I try to keep my write transactions in the 10KB range. Do you have any recommendations on where I should focus my optimization efforts?

SteavedHams · May 29, 2020, 11:38pm

Do your storage server processes have any other roles recruited on them? (see Roles details in log files)

How large are your keys and values?

What is your replication factor?

v-pap · May 31, 2020, 12:13pm

Yes they have other roles recruited to them. In a previous deployment I had set the stateless process count with some value in the kubernetes operator, but these machines were mostly idle, so in the next deployment I set it to -1.
The keys are 60-100 bytes in size, the values are 8 bytes
The replication factor is 2

mengxu · May 31, 2020, 6:52pm

A typical deployment uses two SSes (CPUs) for one disk for better disk IOPS utilization.

v-pap · June 1, 2020, 10:45am

apiVersion: v1
items:
- apiVersion: apps.foundationdb.org/v1beta1
  kind: FoundationDBCluster
  metadata:
    creationTimestamp: "2020-05-26T13:53:06Z"
    generation: 11
    labels:
      app: fdb
      chart: mist-0.1.0
      controller-tools.k8s.io: "1.0"
      heritage: Tiller
      release: dogfood2
    name: dogfood2-mist-fdb
    namespace: mist2
    resourceVersion: "361948695"
    selfLink: /apis/apps.foundationdb.org/v1beta1/namespaces/mist2/foundationdbclusters/dogfood2-mist-fdb
    uid: 5a69cdfd-3c91-44aa-9164-ec39134efcec
  spec:
    automationOptions: {}
    configured: true
    connectionString: dogfood2_mist_fdb:08zDDL09MjHx6c5FT8jZn6yV0Z4kdoBg@10.112.51.8:4501,10.112.60.4:4501,10.112.55.5:4501
    customParameters:
    - knob_disable_posix_kernel_aio=1
    databaseConfiguration:
      redundancy_mode: double
      storage: 14
      storage_engine: ssd
      usable_regions: 1
    faultDomain:
      key: foundationdb.org/none
    mainContainer: {}
    podTemplate:
      metadata:
        creationTimestamp: null
      spec:
        containers:
        - name: foundationdb
          resources:
            limits:
              cpu: 3500m
              memory: 8Gi
            requests:
              cpu: 2000m
              memory: 4Gi
          securityContext:
            runAsUser: 0
        - name: foundationdb-kubernetes-sidecar
          securityContext:
            runAsUser: 0
        initContainers:
        - name: foundationdb-kubernetes-init
          resources: {}
          securityContext:
            runAsUser: 0
        securityContext:
          fsGroup: 0
        tolerations:
        - effect: NoSchedule
          key: dedicated
          operator: Equal
          value: fdb
    processCounts:
      stateless: -1
    runningVersion: 6.2.19
    sidecarContainer: {}
    storageClass: local-scsi
    version: 6.2.19
  status:
    databaseConfiguration:
      log_routers: -1
      log_spill: 2
      logs: 3
      proxies: 3
      redundancy_mode: double
      remote_logs: -1
      resolvers: 1
      storage_engine: ssd-2
      usable_regions: 1
    generations:
      reconciled: 11
    health:
      available: true
      fullReplication: true
    processCounts:
      log: 4
      storage: 14
    requiredAddresses:
      nonTLS: true
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

As you can see the recruited processes have plenty of CPU cores to work with (2-4 cores), but from my measurements I can see that only one of these cores is being utilized. This is true even when the CPU’s core load is close or at 100%.

v-pap · June 10, 2020, 5:17pm

I did an experiment in which I allowed the foundationdb cluster to recruit as many stateless process as it wants. But as you can see the storage processes are still CPU limited. I don’t really know why that’s the case, so any input would be valuable.

  Redundancy mode        - double
  Storage engine         - ssd-2
  Coordinators           - 3
  Desired Proxies        - 3
  Desired Resolvers      - 1
  Desired Logs           - 3

Cluster:
  FoundationDB processes - 26
  Zones                  - 26
  Machines               - 26
  Memory availability    - 8.3 GB per process on machine with least available
  Retransmissions rate   - 30 Hz
  Fault Tolerance        - 1 machine
  Server time            - 06/10/20 17:12:06

Data:
  Replication health     - Healthy (Repartitioning)
  Moving data            - 168.178 GB
  Sum of key-value sizes - 625.783 GB
  Disk space used        - 1.575 TB

Operating space:
  Storage server         - 242.2 GB free on most full server
  Log server             - 354.4 GB free on most full server

Workload:
  Read rate              - 4430 Hz
  Write rate             - 73889 Hz
  Transactions started   - 692 Hz
  Transactions committed - 646 Hz
  Conflict rate          - 1 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:
  10.112.12.222:4501     (  5% cpu; 47% machine; 0.001 Gbps;  0% disk IO; 0.3 GB / 8.3 GB RAM  )
  10.112.25.85:4501      ( 29% cpu; 28% machine; 0.078 Gbps;  0% disk IO; 0.4 GB / 8.8 GB RAM  )
  10.112.28.202:4501     ( 22% cpu; 40% machine; 0.079 Gbps;  0% disk IO; 0.4 GB / 10.4 GB RAM  )
  10.112.42.223:4501     ( 18% cpu; 11% machine; 0.080 Gbps;  0% disk IO; 0.3 GB / 9.9 GB RAM  )
  10.112.49.139:4501     (  5% cpu; 50% machine; 0.001 Gbps;  1% disk IO; 0.3 GB / 10.1 GB RAM  )
  10.112.51.8:4501       ( 65% cpu; 17% machine; 0.013 Gbps; 27% disk IO; 3.3 GB / 14.1 GB RAM  )
  10.112.52.28:4501      ( 83% cpu; 23% machine; 0.023 Gbps; 28% disk IO; 2.7 GB / 14.1 GB RAM  )
  10.112.53.5:4501       ( 18% cpu;  6% machine; 0.064 Gbps; 85% disk IO; 1.1 GB / 14.1 GB RAM  )
  10.112.54.5:4501       ( 18% cpu;  6% machine; 0.057 Gbps; 90% disk IO; 1.0 GB / 14.1 GB RAM  )
  10.112.55.10:4501      (111% cpu; 30% machine; 0.026 Gbps; 27% disk IO; 3.5 GB / 14.1 GB RAM  )
  10.112.56.10:4501      (  0% cpu;  2% machine; 0.000 Gbps;  0% disk IO; 0.3 GB / 13.9 GB RAM  )
  10.112.57.20:4501      (114% cpu; 31% machine; 0.037 Gbps; 27% disk IO; 3.6 GB / 14.1 GB RAM  )
  10.112.58.21:4501      ( 91% cpu; 24% machine; 0.009 Gbps; 31% disk IO; 3.5 GB / 14.1 GB RAM  )
  10.112.59.6:4501       ( 15% cpu;  5% machine; 0.040 Gbps; 94% disk IO; 0.6 GB / 14.1 GB RAM  )
  10.112.60.13:4501      (110% cpu; 30% machine; 0.030 Gbps; 27% disk IO; 4.6 GB / 14.1 GB RAM  )
  10.112.61.4:4501       (100% cpu; 28% machine; 0.142 Gbps; 32% disk IO; 3.5 GB / 14.1 GB RAM  )
  10.112.62.3:4501       ( 96% cpu; 26% machine; 0.082 Gbps; 32% disk IO; 3.5 GB / 14.1 GB RAM  )
  10.112.63.4:4501       ( 57% cpu; 16% machine; 0.005 Gbps; 24% disk IO; 3.6 GB / 14.1 GB RAM  )
  10.112.64.3:4501       (109% cpu; 30% machine; 0.010 Gbps; 32% disk IO; 3.7 GB / 14.2 GB RAM  )
  10.112.65.3:4501       (103% cpu; 28% machine; 0.011 Gbps; 32% disk IO; 3.6 GB / 14.1 GB RAM  )
  10.112.70.37:4501      ( 90% cpu; 49% machine; 0.012 Gbps; 24% disk IO; 3.6 GB / 13.8 GB RAM  )
  10.112.71.3:4501       ( 73% cpu; 42% machine; 0.017 Gbps; 29% disk IO; 3.4 GB / 13.8 GB RAM  )
  10.112.72.3:4501       (106% cpu; 70% machine; 0.092 Gbps; 24% disk IO; 3.7 GB / 13.7 GB RAM  )
  10.112.76.65:4501      ( 18% cpu; 45% machine; 0.010 Gbps;  0% disk IO; 0.3 GB / 10.5 GB RAM  )
  10.112.82.20:4501      ( 22% cpu; 57% machine; 0.070 Gbps;  0% disk IO; 0.4 GB / 9.3 GB RAM  )
  10.112.83.7:4501       (  3% cpu; 11% machine; 0.001 Gbps;  0% disk IO; 0.4 GB / 10.6 GB RAM  )

Coordination servers:
  10.112.52.28:4501  (reachable)
  10.112.60.13:4501  (reachable)
  10.112.64.3:4501  (reachable)

Client time: 06/10/20 17:12:05

fdb> setclass
There are currently 26 processes in the database:
  10.112.12.222:4501: stateless (command_line)
  10.112.25.85:4501: stateless (command_line)
  10.112.28.202:4501: stateless (command_line)
  10.112.42.223:4501: stateless (command_line)
  10.112.49.139:4501: stateless (command_line)
  10.112.51.8:4501: storage (command_line)
  10.112.52.28:4501: storage (command_line)
  10.112.53.5:4501: log (command_line)
  10.112.54.5:4501: log (command_line)
  10.112.55.10:4501: storage (command_line)
  10.112.56.10:4501: stateless (command_line)
  10.112.57.20:4501: storage (command_line)
  10.112.58.21:4501: storage (command_line)
  10.112.59.6:4501: log (command_line)
  10.112.60.13:4501: storage (command_line)
  10.112.61.4:4501: storage (command_line)
  10.112.62.3:4501: storage (command_line)
  10.112.63.4:4501: storage (command_line)
  10.112.64.3:4501: storage (command_line)
  10.112.65.3:4501: storage (command_line)
  10.112.70.37:4501: storage (command_line)
  10.112.71.3:4501: storage (command_line)
  10.112.72.3:4501: storage (command_line)
  10.112.76.65:4501: stateless (command_line)
  10.112.82.20:4501: stateless (command_line)
  10.112.83.7:4501: stateless (command_line)

sears · June 15, 2020, 5:14pm

In your most recent numbers, you have 14 storage servers. Most are saturated, and all are busy. That’s good news, since it means your workload is distributed across many parts of the key space, and the client is offering sufficient concurrency to keep the cluster busy.

I suspect that your NVME drives and network are basically idle in this setup. iostat and nload should be able to confirm this.

Can you try running with 2 and 4 storage servers per drive? If I’m right, that will increase your throughput.

v-pap · June 18, 2020, 3:24pm

Well, thank you for your input but as far as I remember the kubernetes foundationdb operator doesn’t support multiple storage processes on the same drive. One thing that actually helped with this situation was to actually remove the knob_disable_posix_kernel_aio=1 parameter. Now the CPU load on the storage processes is around to 30-50% with the same workload.

ex3ndr · May 18, 2021, 11:16am

I have the same behavior - high CPU usage on all my storage servers, and it doesn’t look like it has saturated disk (adding more cores increases performance lineraly).

I tried knob_disable_posix_kernel_aio=1 and this made everything much worse.

ex3ndr · May 18, 2021, 11:29am

I have 9 storage servers with two processes (15 storage, 3 logs) in three different zones and machines have 2 cores (vCPU) and 8GB ram. I constantly get high usage.

I had only 9 storage processes and increased it to 18, didn’t change much.

Disk is only 4k/iops (50/50 read/write) and 16mb/s. It is Google’s local ssd’s with and their’s performance is orders magnitude higher, but CPU is full and barely able to utilize such disk.

I am using FDB 6.2.20.

Topic		Replies	Views
FoundationdDB Cluster Performance Issues Using FoundationDB performance , operator	11	1305	October 21, 2020
Storage Server CPU bottleneck - Growing data lag Using FoundationDB performance	22	3034	December 13, 2021
Production optimizations Using FoundationDB	20	6414	August 15, 2018
How to troubleshoot throughput performance degrade? Using FoundationDB performance	35	4343	June 20, 2019
WARNING: A single process is both a transaction log and a storage server Using FoundationDB	16	1767	August 13, 2019

CPU limited storage processes

Related topics