FDB 6.2 - proxies processes have 100% CPU usage

erplus · October 10, 2022, 5:18pm

Hi!

I noticed at my cluster some of statless processes (all affected have proxy role) have 100% CPU usage. Alll others (tlog, storage classes) use a CPU average between 20%-50%.

So my first thought was “I have plenty standby statless processes so i am going to increase a number of proxies about twice and resolve an issue!”. After changing a number of proxies the only effect what i observed was increased network upload and download consumption. The aim - getting lower CPU usage by proxy and cluster performance improvment was not achive.

This state with 100% CP usage by proxies last for a long time and i did not find any solution.

My questions:

if proxies really scales up?
how can i decrease a CPU usage by proxies and get better performance (lower latency)?

My 8-nodes FDB cluster installed against bare metals has the following configuration:

fdb> status

Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 5
  Exclusions             - 208 (type `exclude' for details)
  Desired Proxies        - 3
  Desired Resolvers      - 1
  Desired Logs           - 24
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 192
  Zones                  - 8
  Machines               - 8
  Memory availability    - 5.0 GB per process on machine with least available
  Retransmissions rate   - 10 Hz
  Fault Tolerance        - 2 zones
  Server time            - 10/10/22 18:42:27

Data:
  Replication health     - Healthy
  Moving data            - 0.000 GB
  Sum of key-value sizes - 654.778 GB
  Disk space used        - 2.556 TB

Operating space:
  Storage server         - 1503.8 GB free on most full server
  Log server             - 885.8 GB free on most full server

Workload:
  Read rate              - 48058 Hz
  Write rate             - 30617 Hz
  Transactions started   - 56685 Hz
  Transactions committed - 15567 Hz
  Conflict rate          - 2 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Server hardware configuration (every single server has the same configuration and the same number of fdb processes):

CPU: 32 HT (16 cores x 2.10GHz)
Memory: 128GB RAM
Storage: 1x NVMe disk for tlog processes, 1x NVMe disk for storage server processes
Network: 10G

andrew.noyes · October 10, 2022, 9:01pm

A couple things come to mind:

In FDB 6.2, proxies need to contact every other proxy in order to serve a GetReadVersion request / start a transaction. The commit protocol was changed so that this is no longer necessary in version 7.0
The amount of work a proxy needs to do to commit a transaction batch scales with the number of tlogs
Looking through release notes, there was at least one bugfix that addressed “overloaded proxies” ([release 6.3] Fix get key location overloading proxies by halfprice · Pull Request #6453 · apple/foundationdb · GitHub) that I think might be present still in 6.2

It’s hard to say exactly why this is happening, but it does make sense that adding proxies might not reduce cpu usage. I’m not sure what to recommend. I would consider upgrading the cluster to 7.0+ and/or experimenting with fewer tlogs (and somewhat counterintuitively, fewer proxies)

Topic		Replies	Views
Why many clientThreads will cause fdbserver(stateless & grv_proxy) CPU too high? Using FoundationDB bindings , performance	9	360	July 17, 2025
Identifying number of proxies Using FoundationDB	9	2148	May 6, 2022
Can't scale proxies Using FoundationDB	5	932	June 6, 2023
100% CPU usage on an idle cluster Using FoundationDB	9	543	May 3, 2023
Cluster tuning cookbook Using FoundationDB	26	9422	February 1, 2019

FDB 6.2 - proxies processes have 100% CPU usage

Related topics