Curious about CPU utilization of Single-core Write Test #1

I am testing the performance of a single machine using the command fdbserver -r multitest -f RandomRead.txt --num-testers 1
The test concluded that 31% of a single core CPU would be used, not 3% CPU on a single core like the documentation says, there is an order of magnitude difference.

I’m wondering if there’s something wrong with the way I’m testing, or if the documentation is written wrong?
I hope you can help to look at this problem, thank you very much!

Here are my machine environments and test configurations:

RandomRead.txt:

testTitle=RandomReadWriteTest
    testName=ReadWrite
    testDuration=60.0
    transactionsPerSecond=970
    writesPerTransactionA=1
    readsPerTransactionA=0
    alpha=0
    nodeCount=1000000
    valueBytes=16
    discardEdgeMeasurements=false

fdbserver performance result:

startingConfiguration: start
Set perpetual_storage_wiggle=0 ...
Set perpetual_storage_wiggle=0 Done.
Run test:RandomReadWriteTest start
setting up test (RandomReadWriteTest)...
running test (RandomReadWriteTest)...
RandomReadWriteTest complete
checking test (RandomReadWriteTest)...
fetching metrics (RandomReadWriteTest)...
Metric (0, 0): Measured Duration, 60.000000, 60
Metric (0, 1): Transactions/sec, 967.833333, 968
Metric (0, 2): Operations/sec, 967.833333, 968
Metric (0, 3): A Transactions, 58070.000000, 58070
Metric (0, 4): B Transactions, 0.000000, 0
Metric (0, 5): Retries, 0.000000, 0
Metric (0, 6): Mean load time (seconds), 0.000000, 0
Metric (0, 7): Read rows, 0.000000, 0
Metric (0, 8): Write rows, 58070.000000, 5.81e+04
Metric (0, 9): Mean Latency (ms), 0.915235, 0.915
Metric (0, 10): Median Latency (ms, averaged), 0.908136, 0.908
Metric (0, 11): 90% Latency (ms, averaged), 1.272678, 1.27
Metric (0, 12): 98% Latency (ms, averaged), 1.796007, 1.8
Metric (0, 13): Max Latency (ms, averaged), 3.895044, 3.9
Metric (0, 14): Mean Row Read Latency (ms), 0.000000, 0
Metric (0, 15): Median Row Read Latency (ms, averaged), 0.000000, 0
Metric (0, 16): Max Row Read Latency (ms, averaged), 0.000000, 0
Metric (0, 17): Mean Total Read Latency (ms), 0.000000, 0
Metric (0, 18): Median Total Read Latency (ms, averaged), 0.000000, 0
Metric (0, 19): Max Total Latency (ms, averaged), 0.000000, 0
Metric (0, 20): Mean GRV Latency (ms), 0.181912, 0.182
Metric (0, 21): Median GRV Latency (ms, averaged), 0.158787, 0.159
Metric (0, 22): Max GRV Latency (ms, averaged), 1.470566, 1.47
Metric (0, 23): Mean Commit Latency (ms), 0.745237, 0.745
Metric (0, 24): Median Commit Latency (ms, averaged), 0.752211, 0.752
Metric (0, 25): Max Commit Latency (ms, averaged), 3.629923, 3.63
Metric (0, 26): Read rows/sec, 0.000000, 0
Metric (0, 27): Write rows/sec, 967.833333, 968
Metric (0, 28): Bytes read/sec, 0.000000, 0
Metric (0, 29): Bytes written/sec, 30970.666667, 3.1e+04
1 test clients passed; 0 test clients failed
Run test:RandomReadWriteTest Done.

1 tests passed; 0 tests failed.
Waiting for DD to end...
Set perpetual_storage_wiggle=0 ...
Set perpetual_storage_wiggle=0 Done.

foundationdb status:

port 4501 is test client.

fdb> status details

Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - single
  Storage engine         - ssd-2
  Coordinators           - 1
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 2
  Zones                  - 1
  Machines               - 1
  Memory availability    - 8.0 GB per process on machine with least available
  Fault Tolerance        - 0 machines
  Server time            - 08/03/22 13:48:40

Data:
  Replication health     - Healthy
  Moving data            - 0.000 GB
  Sum of key-value sizes - 31 MB
  Disk space used        - 263 MB

Operating space:
  Storage server         - 3395.2 GB free on most full server
  Log server             - 3395.2 GB free on most full server

Workload:
  Read rate              - 20 Hz
  Write rate             - 946 Hz
  Transactions started   - 954 Hz
  Transactions committed - 946 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:
  127.0.0.1:4500         ( 31% cpu;  2% machine; 0.009 Gbps; 97% disk IO; 0.6 GB / 8.0 GB RAM  )
  127.0.0.1:4501         ( 12% cpu;  2% machine; 0.009 Gbps; 97% disk IO; 0.1 GB / 8.0 GB RAM  )

Coordination servers:
  127.0.0.1:4500  (reachable)

Client time: 08/03/22 13:48:40

machine configuration:

CPU: Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz

NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Linux VM-0-15-ubuntu 5.4.0-121-generic #137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Memory: 64GB
Disk: SSD 3T

You are running a native (as in, doesn’t use the C bindings) workload. This code is written in flow. It uses async-programming and multiplexing.

So while this test is indeed multi-threaded, it is highly concurrent. By default this is starting 50 actors that run these transactions concurrently. If you want to do simulate the example from the documentation you should set actorCount=1.

Here’s how the workload you’re running is being started:

		for (int c = 0; c < self->actorCount; c++) {
			Future<Void> worker;
			if (self->useRYW)
				worker = self->randomReadWriteClient<ReadYourWritesTransaction>(
				    cx, self, self->actorCount / self->transactionsPerSecond, c);
			else
				worker = self->randomReadWriteClient<Transaction>(
				    cx, self, self->actorCount / self->transactionsPerSecond, c);
			clients.push_back(worker);
		}

		if (!self->cancelWorkersAtDuration)
			self->clients = clients; // Don't cancel them until check()

		wait(self->cancelWorkersAtDuration ? timeout(waitForAll(clients), self->testDuration, Void())
		                                   : delay(self->testDuration));

First of all, thank you very much.
Based on your suggestion, I tried it.
The conclusion is: 970 writes/second, but the single-core CPU usage is 42%, there is still a big difference in CPU usage-3%.

command: fdbserver -r multitest -f RandomRead.txt --num-testers 1

RandomRead.txt

testTitle=RandomReadWriteTest
    testName=ReadWrite
    testDuration=60.0
    writesPerTransactionA=1
    readsPerTransactionA=0
    alpha=0
    actorCount=1
    actorCountPerTester=1
    nodeCount=1000000
    keyBytes=16
    valueBytes=16
    discardEdgeMeasurements=false

fdbserver performance result:

startingConfiguration: start
Set perpetual_storage_wiggle=0 ...
Set perpetual_storage_wiggle=0 Done.
Run test:RandomReadWriteTest start
setting up test (RandomReadWriteTest)...
running test (RandomReadWriteTest)...
RandomReadWriteTest complete
checking test (RandomReadWriteTest)...
fetching metrics (RandomReadWriteTest)...
Metric (0, 0): Measured Duration, 60.000000, 60
Metric (0, 1): Transactions/sec, 969.800000, 970
Metric (0, 2): Operations/sec, 969.800000, 970
Metric (0, 3): A Transactions, 58188.000000, 58188
Metric (0, 4): B Transactions, 0.000000, 0
Metric (0, 5): Retries, 0.000000, 0
Metric (0, 6): Mean load time (seconds), 0.000000, 0
Metric (0, 7): Read rows, 0.000000, 0
Metric (0, 8): Write rows, 58188.000000, 5.82e+04
Metric (0, 9): Mean Latency (ms), 1.030065, 1.03
Metric (0, 10): Median Latency (ms, averaged), 1.014948, 1.01
Metric (0, 11): 90% Latency (ms, averaged), 1.065969, 1.07
Metric (0, 12): 98% Latency (ms, averaged), 1.093149, 1.09
Metric (0, 13): Max Latency (ms, averaged), 3.804922, 3.8
Metric (0, 14): Mean Row Read Latency (ms), 0.000000, 0
Metric (0, 15): Median Row Read Latency (ms, averaged), 0.000000, 0
Metric (0, 16): Max Row Read Latency (ms, averaged), 0.000000, 0
Metric (0, 17): Mean Total Read Latency (ms), 0.000000, 0
Metric (0, 18): Median Total Read Latency (ms, averaged), 0.000000, 0
Metric (0, 19): Max Total Latency (ms, averaged), 0.000000, 0
Metric (0, 20): Mean GRV Latency (ms), 0.168000, 0.168
Metric (0, 21): Median GRV Latency (ms, averaged), 0.158310, 0.158
Metric (0, 22): Max GRV Latency (ms, averaged), 1.307249, 1.31
Metric (0, 23): Mean Commit Latency (ms), 0.864673, 0.865
Metric (0, 24): Median Commit Latency (ms, averaged), 0.854492, 0.854
Metric (0, 25): Max Commit Latency (ms, averaged), 2.990484, 2.99
Metric (0, 26): Read rows/sec, 0.000000, 0
Metric (0, 27): Write rows/sec, 969.800000, 970
Metric (0, 28): Bytes read/sec, 0.000000, 0
Metric (0, 29): Bytes written/sec, 31033.600000, 3.1e+04
1 test clients passed; 0 test clients failed
Run test:RandomReadWriteTest Done.

1 tests passed; 0 tests failed.
Waiting for DD to end...
Set perpetual_storage_wiggle=0 ...
Set perpetual_storage_wiggle=0 Done.

foundationdb status:

port 4508 is test client.

fdb> status details

Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - single
  Storage engine         - ssd-2
  Coordinators           - 1
  Exclusions             - 3 (type `exclude' for details)
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 2
  Zones                  - 1
  Machines               - 1
  Memory availability    - 8.0 GB per process on machine with least available
  Retransmissions rate   - 0 Hz
  Fault Tolerance        - 0 machines
  Server time            - 08/04/22 11:09:23

Data:
  Replication health     - Healthy
  Moving data            - 0.000 GB
  Sum of key-value sizes - 31 MB
  Disk space used        - 328 MB

Operating space:
  Storage server         - 3395.2 GB free on most full server
  Log server             - 3395.2 GB free on most full server

Workload:
  Read rate              - 23 Hz
  Write rate             - 836 Hz
  Transactions started   - 840 Hz
  Transactions committed - 836 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:
  127.0.0.1:4500         ( 42% cpu;  3% machine; 0.009 Gbps;100% disk IO; 1.2 GB / 8.0 GB RAM  )
  127.0.0.1:4508         ( 13% cpu;  3% machine; 0.009 Gbps;100% disk IO; 0.1 GB / 8.0 GB RAM  )

Coordination servers:
  127.0.0.1:4500  (reachable)

Client time: 08/04/22 11:09:23

That “documentation” page appears to be from before the FoundationDB acquisition, which explains why it reads more like marketing material, and some of the similar performance pages have claims which are horrendously out of date. They should probably be cleaned up at some point…

The largest relevant change I can think of is that back then memory was the default storage engine, which would be significantly resource intensive than ssd, but limited to storing only 1GB of data by default. It made sense as a default setting when HDDs were common, but made much less sense once most developers had laptops with SSDs in them.

Did you have some deeper reason for investigating FDB single core performance, or were you just curious about validating the claims on the website?

Thank you for your reply.

We chose FDB, and we want to have an accurate understanding of its performance, so we start measuring its standalone performance, and later we will plan to measure its cluster performance, and performance in multi-redundancy mode; when I saw the data in the documentation, I started to wonder if my testing method or the configuration of FDB was wrong. Now, it looks like the data in the document should need to be updated.

The ultimate goal is to determine what configuration of machine should be used, or how much hardware resources should be invested, based on the measured performance.

Can you give some advice in order to achieve my goal?

Single process performance of FDB won’t provide good intuition as to what the clustered FDB performance would be. There’s a lot of threads in the forums over the years about configuring and benchmarking a cluster. I’d suggest starting with searching “process class”, which hits most of the related threads.

If I recall, Segment also gave a lot of description of their exact setup, testing, and path to production in their FoundationDB Summit talk.

cool!
Thank you for such a patient reply. :grinning: