We’re performing some load tests with a new FoundationDB cluster (7.3.43 with three_data_hall
replication and the Redwood storage engine).
In this test, we scaled up load to the point where our log processes’ CPU consumption was very close to saturated. At that point, we doubled the number of log processes and commit proxies from 4 to 8. We were surprised to observe that the CPU load on both process classes remained the same despite unchanged load; in other words, we just went from 4 log processes at 95% CPU utilization to 8 log processes with 95% CPU utilization. We also observed a modest increase in commit latency as reported by the latency probe.
Is this expected, or have we committed a scaling faux pas? In terms of scaling log processes, what metrics should we be watching to decide that it’s time to add more log processes? Is it normal/expected for log processes to be CPU-limited?
Thanks!