Hi!
I noticed at my cluster some of statless processes (all affected have proxy role) have 100% CPU usage. Alll others (tlog, storage classes) use a CPU average between 20%-50%.
So my first thought was “I have plenty standby statless processes so i am going to increase a number of proxies about twice and resolve an issue!”. After changing a number of proxies the only effect what i observed was increased network upload and download consumption. The aim - getting lower CPU usage by proxy and cluster performance improvment was not achive.
This state with 100% CP usage by proxies last for a long time and i did not find any solution.
My questions:
- if proxies really scales up?
- how can i decrease a CPU usage by proxies and get better performance (lower latency)?
My 8-nodes FDB cluster installed against bare metals has the following configuration:
fdb> status
Using cluster file `/etc/foundationdb/fdb.cluster'.
Configuration:
Redundancy mode - triple
Storage engine - ssd-2
Coordinators - 5
Exclusions - 208 (type `exclude' for details)
Desired Proxies - 3
Desired Resolvers - 1
Desired Logs - 24
Usable Regions - 1
Cluster:
FoundationDB processes - 192
Zones - 8
Machines - 8
Memory availability - 5.0 GB per process on machine with least available
Retransmissions rate - 10 Hz
Fault Tolerance - 2 zones
Server time - 10/10/22 18:42:27
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 654.778 GB
Disk space used - 2.556 TB
Operating space:
Storage server - 1503.8 GB free on most full server
Log server - 885.8 GB free on most full server
Workload:
Read rate - 48058 Hz
Write rate - 30617 Hz
Transactions started - 56685 Hz
Transactions committed - 15567 Hz
Conflict rate - 2 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Server hardware configuration (every single server has the same configuration and the same number of fdb processes):
- CPU: 32 HT (16 cores x 2.10GHz)
- Memory: 128GB RAM
- Storage: 1x NVMe disk for tlog processes, 1x NVMe disk for storage server processes
- Network: 10G