I noticed at my cluster some of statless processes (all affected have proxy role) have 100% CPU usage. Alll others (tlog, storage classes) use a CPU average between 20%-50%.
So my first thought was “I have plenty standby statless processes so i am going to increase a number of proxies about twice and resolve an issue!”. After changing a number of proxies the only effect what i observed was increased network upload and download consumption. The aim - getting lower CPU usage by proxy and cluster performance improvment was not achive.
This state with 100% CP usage by proxies last for a long time and i did not find any solution.
- if proxies really scales up?
- how can i decrease a CPU usage by proxies and get better performance (lower latency)?
My 8-nodes FDB cluster installed against bare metals has the following configuration:
fdb> status Using cluster file `/etc/foundationdb/fdb.cluster'. Configuration: Redundancy mode - triple Storage engine - ssd-2 Coordinators - 5 Exclusions - 208 (type `exclude' for details) Desired Proxies - 3 Desired Resolvers - 1 Desired Logs - 24 Usable Regions - 1 Cluster: FoundationDB processes - 192 Zones - 8 Machines - 8 Memory availability - 5.0 GB per process on machine with least available Retransmissions rate - 10 Hz Fault Tolerance - 2 zones Server time - 10/10/22 18:42:27 Data: Replication health - Healthy Moving data - 0.000 GB Sum of key-value sizes - 654.778 GB Disk space used - 2.556 TB Operating space: Storage server - 1503.8 GB free on most full server Log server - 885.8 GB free on most full server Workload: Read rate - 48058 Hz Write rate - 30617 Hz Transactions started - 56685 Hz Transactions committed - 15567 Hz Conflict rate - 2 Hz Backup and DR: Running backups - 0 Running DRs - 0
Server hardware configuration (every single server has the same configuration and the same number of fdb processes):
- CPU: 32 HT (16 cores x 2.10GHz)
- Memory: 128GB RAM
- Storage: 1x NVMe disk for tlog processes, 1x NVMe disk for storage server processes
- Network: 10G