FoundationDB

Why doesn't my cluster performance scale when I double the number of machines?


(Matt Lohier) #1

Hello fellow FDB aficionado! :grinning:

I was wondering if you could help me identifying why I am not seeing any improvements after I scale my FDB cluster out.

I have read the performance page and I know it is supposed to scale… I must be doing something incorrectly.

Explanation:
Let me be more specific with a benchmark I just run today using FDB 5.2 on linux on AWS/EC2.

I started a cluster of 10 instances (i3.xlarge with NVME SSD 900GB high network throughput, 4vCPU).
For information I had 16 storage processes (2 per disk or per instance), 2 transaction log processes (1 per disk on different instance), the rest of the processes stateless. I had 2 proxies configured, 3 coordinators and the 1 default resolver. Storage is ssd with double copy. This is based on recommendations from Alex Miller and Thomas Johnson (It wouldn’t be proper to tag them directly here)

Then I start 25 tester instances to generate some transactions at the rate of 25,000 transaction per sec. Each transaction consists of 1 read and 5 writes. So in total I have 125,000 Write HZ and 25,000 Read HZ.

I’m getting good transaction speed at 40ms on average (recorded on the client/tester side).
(Looking at the FDB latency probe transaction and read are under 10ms, but commit is at about 40ms which may indicate the beginning of a problem).

I increase the load by 10k, and eventually repeat the test with 50 tester instances generating 50,000 transactions/sec. This is too much for the cluster, it seems to saturate (the latency probe for commit is at 2.5s when that happens, and I am getting about 5% conflicts)

So I found the limit of the cluster, and I’m expecting that if I scale out to 20 FDB instances I might be able to serve that transaction rate. Hence I start 10 more instances to join the cluster. And apply the same ratios: double the log processes to 4, 4 proxies too. I now have 32 storage processes on 20 machines.

I repeat the test with 25 testers (25,000 tps) and I get exactly the same transaction processing time (40-50ms) as when I had 10 machines.
I also repeat the test with 50 testers (50,000 tps) and get into the same issue, the cluster is saturated same as when I had 10 FDB machines.

Why would that be…?
If CPU or even disk IO was a bottleneck with 10 instances shouldn’t that get avoided if I double the number of machines… Could I be hitting some other limit? The resolver process maybe… isn’t it recommended to have a single resolver tho?

I would really appreciate your help on this.

Thank you!


(Alex Miller) #2

After adding the additional processes, and before running your second test, did you run configure logs=4 proxies=4 from fdbcli so that FDB actually places new logs and proxies in the processes that you gave it?

Process classes specify where FDB is allowed to place roles that it tries to recruit. The configuration specifies how many of what roles FDB will try to recruit somewhere.


(Matt Lohier) #3

Yes I did Alex because I noticed they were not recruited otherwise. status json confirmed the 32 storage processes and 4 log processes on 4 different instances.


(Alex Miller) #4

Can you double check that your 4 proxies are on different machines as well (and different from the machines with log processes)? Are you sure you aren’t bound on a single machine’s network bandwidth somewhere? Can you pastebin your status json and post that?

The fact that you’re seeing no increase in throughput makes this strongly sound like there’s something that’s you’re fully utilizing already. I’ve never doubled the number of proxies and logs in a setup and seen no change at saturation. Would it be relatively easy for you to run poma’s fdbtop to be able to get a quick by machine/role overview of your cluster’s resource utilization?


(Matt Lohier) #5

Sorry Alex I forgot to thank you before for answering and helping me out: thank you!!

I have shutdown that cluster but I will build another one, and check the location of the 4 proxies. Is there a class I can configure to force/suggest the recruitment of proxy on different instances?

I will try to add fdbtop to my instances.

Shall I repeat that benchmark in 2 stages like I had done?
1- start with 10 machines, and run a test at 25,000 tps and then 50,000 tps (1 transaction = 1 read + 5 write )
2 - scale out to 20 machines, and run the same tests.

Capture status json every 10sec for a couple of minutes and run fdbtop in the foreground.

Could running the memory engine help me identify if it’s diskIO related?

Thx!


(Roman Semenov) #6

Are you using your own test software or fdb instances with test role? Maybe there is some performance issue in your testers?

Are your tests longer than 2min?


(Dave Rosenthal) #7

I imagine that looking at cpu utilization metrics for your test with 20 machines where it is in saturation will suggest the cause. Some FDB process will probably not be under proper load.

To speculate on one potential cause: I don’t know how good hot-shard splitting is these days, but it’s possible (especially in a short test) that you don’t have enough data in your test for it to be well-distributed over the nodes.

Also, 40ms for a 5-write 1-read transaction is pretty bad. I would try to run your test workload at a much lower rate and validate that 40ms is indeed the “nominal” latency. If so, something needs some help because AWS with NVME SSDs should be considerable better I would think.


(Matt Lohier) #8

Hello Roman,

Thank you for fdbtop and the test-cluster! I will use fdbtop now.

We are using our own GO tester program with creates 10,000 goroutines and once every 10s at random one transaction is committed, that happens 10 times. I was running 50 testers like that to simulate the load. The test would last 10x10sec (1m40s).
The keys are tuple in the form:

  • A, B
  • A, C
  • A, D
  • A, D, E
  • G, A, F
    where each variable name is the same for a particular test run. So A could be a hot shard and data not distributed properly over the nodes?

To use fdb test role, I would:

  • install the fdb.cluster file into the tester instance
  • set class = test for each fdb process (If I have 4vCPU is it ok to configure 4 FDB processes each with test class?)
  • start the tester like that: fdbserver -r multitest

Repeat with multiple instances…
Is that right?

Thank you !


(Alex Miller) #9

There is a proxy class, which is conveniently named proxy.

I think I’m mostly just interested in the process layout that you get in your 20 machine cluster. If you can run your 50k tps/sec test, and confirm that you get the same results as last time, that would be a good sanity check.

If your problem was storage server disk IO related, I would have strongly expected doubling the number of storage servers in your cluster to greatly improve things.

I will also second daverosenthal’s comment about splitting. FDB will split shards by data volume and bandwidth. I haven’t personally run into persistent “hot shard” problems in performance tests that I’ve run, but not having enough splits pre-created in the database has definitely thrown off some of my results. If your database contains less shards than storage servers, then additional storage servers are no help. (I could have sworn I recalled Evan doing some work to make pre-splitting for load tests easier, but I can’t seem to find an issue or PR about it…)

EDIT: You can see the full list of role names here and their meaning here.


(Matt Lohier) #10

Okay Alex, thank you!

This is what I have in my cluster of 10 machines at the moment:

ip               port    cpu%  mem%  iops  net  class        roles
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.32.74     4500    1     6     6     0    log          log
                  4501    0     4     -     0    stateless
                  4502    0     3     -     0    stateless
                  4503    5     3     -     2    stateless    cluster_controller
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.33.171    4500    1     5     9     0    storage      storage
                  4501    1     4     9     0    storage      storage
                  4502    0     3     -     0    stateless
                  4503    1     3     -     0    stateless
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.33.172    4500    1     6     8     0    storage      storage
                  4501    1     4     8     0    storage      storage
                  4502    1     3     -     0    stateless
                  4503    1     2     -     0    stateless
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.34.155    4500    1     6     9     0    storage      storage
                  4501    1     4     9     0    storage      storage
                  4502    0     3     -     0    stateless
                  4503    1     3     -     0    stateless
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.37.131    4500    1     5     8     0    storage      storage
                  4501    1     4     8     0    storage      storage
                  4502    1     4     -     0    stateless
                  4503    3     3     -     0    stateless    master
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.39.85     4500    1     4     7     0    log          log
                  4501    0     4     -     0    stateless
                  4502    1     2     -     0    stateless
                  4503    1     2     -     0    stateless
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.42.96     4500    1     5     8     0    storage      storage
                  4501    1     4     8     0    storage      storage
                  4502    1     2     -     0    proxy        proxy
                  4503    1     3     -     0    stateless
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.44.149    4500    1     4     8     0    storage      storage
                  4501    1     4     8     0    storage      storage
                  4502    1     2     -     0    proxy        proxy
                  4503    0     2     -     0    stateless
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.46.120    4500    1     5     8     0    storage      storage
                  4501    1     4     8     0    storage      storage
                  4502    0     3     -     0    stateless
                  4503    0     3     -     0    stateless
---------------  ------  ----  ----  ----  ---  -----------  --------------------
 172.31.47.4      4500    1     6     8     0    storage      storage
                  4501    1     4     8     0    storage      storage
                  4502    1     3     -     0    stateless
                  4503    1     4     -     0    stateless    resolver

In your last comment Alex, are you suggesting that I need to prepare my database by pre-creating some shards? I’m definitely not doing anything special…

Thanks


(Alex Miller) #11

Cluster layout LGTM. Did you have a benchmark running when you ran fdbtop? If not, then if you happen to grab one, it’d be interesting to see the cpu/net/disk percentages while you’re at saturation.

I think the quickest and easiest way to do the “pre-loading of data” is just to let your test run for longer, if you haven’t been already. If you’re spinning up an empty database, writing your 50k transactions/second against it for minute, and then tearing it down, then you might never actually get to a steady state where your load is evenly distributed across all the storage servers you have.

What is the size of key-value bytes stored when you start your test, and what’s the size of key-value bytes when you end the test? (The status command should give you this number.)

If you run your performance test, configured where you were getting ~40ms commits, for a longer span of time, do you see anything change?


(Alex Miller) #12

multitest takes a variety of arguments:

  • -C <clusterfile> – the same cluster as everywhere else
  • -r multitest – the obvious requirement
  • -f <testspec> – one of the files under tests/ , or something that follows a similar pattern.
  • --num_testers N – the number of tester roles to recruit

A test spec of something like

testTitle = Benchmark
testName = ReadWrite
readsPerTransactionA = 1
writesPerTransactionA = 5
alpha = 0
testDuration = 60
runConsistencyCheck = false

would be about the same as the go client test that you’ve been running. There’s two more options you can add: nodeCount = N defines the number of keys involved in the workload, and valueBytes / minValueBytes defines the size of the value written in the tests.

(Though I didn’t take the time to throw up a mini cluster locally and actually invoke multitest against it, so I’m sorry if anything I gave in the above is wrong.)

If you are going to write a client in Go though, there is some degree of value in benchmarking FDB via your client, because it might teach you things that you’ll need to know when writing an FDB client.


(Matt Lohier) #13

I have run Roman’s FDB tester config (alpha at 0.8 @ 2R/8W) against my 10 machines cluster.
The fdbserver in test role was run from a single machine (4 processes) tho (as a first test).

But I got the following result with: fdbserver -r multitest -f test1.txt

setting up test (RandomReadWriteTest)...
running test...
RandomReadWriteTest complete
checking tests...
fetching metrics...
Metric (0, 0): Measured Duration, 300.000000, 300
Metric (0, 1): Transactions/sec, 4145.546667, 4.15e+03
Metric (0, 2): Operations/sec, 44770.380000, 4.48e+04
Metric (0, 3): A Transactions, 249190.000000, 249190
Metric (0, 4): B Transactions, 994474.000000, 994474
Metric (0, 5): Retries, 70623.000000, 70623
Metric (0, 6): Mean load time (seconds), 34.638211, 34.6
Metric (0, 7): Read rows, 3486374.000000, 3.49e+06
Metric (0, 8): Write rows, 9944740.000000, 9.94e+06
Metric (0, 9): Mean Latency (ms), 7391.912920, 7.39e+03
Metric (0, 10): Median Latency (ms, averaged), 8372.556686, 8.37e+03
Metric (0, 11): 90% Latency (ms, averaged), 8895.140648, 8.9e+03
Metric (0, 12): 98% Latency (ms, averaged), 9482.069254, 9.48e+03
Metric (0, 13): Max Latency (ms, averaged), 30979.467392, 3.1e+04
Metric (0, 14): Mean Row Read Latency (ms), 2.344409, 2.34
Metric (0, 15): Median Row Read Latency (ms, averaged), 2.067089, 2.07
Metric (0, 16): Max Row Read Latency (ms, averaged), 231.909990, 232
Metric (0, 17): Mean Total Read Latency (ms), 2.615381, 2.62
Metric (0, 18): Median Total Read Latency (ms, averaged), 2.139568, 2.14
Metric (0, 19): Max Total Latency (ms, averaged), 231.909990, 232
Metric (0, 20): Mean GRV Latency (ms), 6276.434571, 6.28e+03
Metric (0, 21): Median GRV Latency (ms, averaged), 8277.676105, 8.28e+03
Metric (0, 22): Max GRV Latency (ms, averaged), 10536.789656, 1.05e+04
Metric (0, 23): Mean Commit Latency (ms), 726.327480, 726
Metric (0, 24): Median Commit Latency (ms, averaged), 3.972530, 3.97
Metric (0, 25): Max Commit Latency (ms, averaged), 5891.954184, 5.89e+03
Metric (0, 26): Read rows/sec, 11621.246667, 1.16e+04
Metric (0, 27): Write rows/sec, 33149.133333, 3.31e+04
Metric (0, 28): Bytes read/sec, 883214.746667, 8.83e+05
Metric (0, 29): Bytes written/sec, 2519334.133333, 2.52e+06
Metric (1, 0): Measured Duration, 300.000000, 300
Metric (1, 1): Transactions/sec, 4054.603333, 4.05e+03
Metric (1, 2): Operations/sec, 43791.943333, 4.38e+04
Metric (1, 3): A Transactions, 242608.000000, 242608
Metric (1, 4): B Transactions, 973773.000000, 973773
Metric (1, 5): Retries, 70863.000000, 70863
Metric (1, 6): Mean load time (seconds), 34.758651, 34.8
Metric (1, 7): Read rows, 3399853.000000, 3.4e+06
Metric (1, 8): Write rows, 9737730.000000, 9.74e+06
Metric (1, 9): Mean Latency (ms), 7551.264475, 7.55e+03
Metric (1, 10): Median Latency (ms, averaged), 8389.169931, 8.39e+03
Metric (1, 11): 90% Latency (ms, averaged), 8908.598423, 8.91e+03
Metric (1, 12): 98% Latency (ms, averaged), 9758.384228, 9.76e+03
Metric (1, 13): Max Latency (ms, averaged), 27148.654461, 2.71e+04
Metric (1, 14): Mean Row Read Latency (ms), 2.352880, 2.35
Metric (1, 15): Median Row Read Latency (ms, averaged), 2.061367, 2.06
Metric (1, 16): Max Row Read Latency (ms, averaged), 177.886009, 178
Metric (1, 17): Mean Total Read Latency (ms), 2.624192, 2.62
Metric (1, 18): Median Total Read Latency (ms, averaged), 2.137899, 2.14
Metric (1, 19): Max Total Latency (ms, averaged), 177.886009, 178
Metric (1, 20): Mean GRV Latency (ms), 6465.125342, 6.47e+03
Metric (1, 21): Median GRV Latency (ms, averaged), 8290.379047, 8.29e+03
Metric (1, 22): Max GRV Latency (ms, averaged), 10553.521156, 1.06e+04
Metric (1, 23): Mean Commit Latency (ms), 691.432055, 691
Metric (1, 24): Median Commit Latency (ms, averaged), 3.969431, 3.97
Metric (1, 25): Max Commit Latency (ms, averaged), 5888.845682, 5.89e+03
Metric (1, 26): Read rows/sec, 11332.843333, 1.13e+04
Metric (1, 27): Write rows/sec, 32459.100000, 3.25e+04
Metric (1, 28): Bytes read/sec, 861296.093333, 8.61e+05
Metric (1, 29): Bytes written/sec, 2466891.600000, 2.47e+06
Metric (2, 0): Measured Duration, 300.000000, 300
Metric (2, 1): Transactions/sec, 4154.626667, 4.15e+03
Metric (2, 2): Operations/sec, 44870.083333, 4.49e+04
Metric (2, 3): A Transactions, 249243.000000, 249243
Metric (2, 4): B Transactions, 997145.000000, 997145
Metric (2, 5): Retries, 71818.000000, 71818
Metric (2, 6): Mean load time (seconds), 35.332993, 35.3
Metric (2, 7): Read rows, 3489575.000000, 3.49e+06
Metric (2, 8): Write rows, 9971450.000000, 9.97e+06
Metric (2, 9): Mean Latency (ms), 7411.532762, 7.41e+03
Metric (2, 10): Median Latency (ms, averaged), 8392.208815, 8.39e+03
Metric (2, 11): 90% Latency (ms, averaged), 8912.358761, 8.91e+03
Metric (2, 12): 98% Latency (ms, averaged), 9475.633383, 9.48e+03
Metric (2, 13): Max Latency (ms, averaged), 25672.571898, 2.57e+04
Metric (2, 14): Mean Row Read Latency (ms), 2.344298, 2.34
Metric (2, 15): Median Row Read Latency (ms, averaged), 2.064466, 2.06
Metric (2, 16): Max Row Read Latency (ms, averaged), 233.731270, 234
Metric (2, 17): Mean Total Read Latency (ms), 2.595222, 2.6
Metric (2, 18): Median Total Read Latency (ms, averaged), 2.136469, 2.14
Metric (2, 19): Max Total Latency (ms, averaged), 233.731270, 234
Metric (2, 20): Mean GRV Latency (ms), 6280.102715, 6.28e+03
Metric (2, 21): Median GRV Latency (ms, averaged), 8264.917135, 8.26e+03
Metric (2, 22): Max GRV Latency (ms, averaged), 10546.088219, 1.05e+04
Metric (2, 23): Mean Commit Latency (ms), 708.264522, 708
Metric (2, 24): Median Commit Latency (ms, averaged), 3.983021, 3.98
Metric (2, 25): Max Commit Latency (ms, averaged), 5891.344309, 5.89e+03
Metric (2, 26): Read rows/sec, 11631.916667, 1.16e+04
Metric (2, 27): Write rows/sec, 33238.166667, 3.32e+04
Metric (2, 28): Bytes read/sec, 884025.666667, 8.84e+05
Metric (2, 29): Bytes written/sec, 2526100.666667, 2.53e+06
Metric (3, 0): Measured Duration, 300.000000, 300
Metric (3, 1): Transactions/sec, 4069.620000, 4.07e+03
Metric (3, 2): Operations/sec, 43950.493333, 4.4e+04
Metric (3, 3): A Transactions, 244598.000000, 244598
Metric (3, 4): B Transactions, 976288.000000, 976288
Metric (3, 5): Retries, 71355.000000, 71355
Metric (3, 6): Mean load time (seconds), 34.626995, 34.6
Metric (3, 7): Read rows, 3422268.000000, 3.42e+06
Metric (3, 8): Write rows, 9762880.000000, 9.76e+06
Metric (3, 9): Mean Latency (ms), 7578.619229, 7.58e+03
Metric (3, 10): Median Latency (ms, averaged), 8371.334553, 8.37e+03
Metric (3, 11): 90% Latency (ms, averaged), 8914.507389, 8.91e+03
Metric (3, 12): 98% Latency (ms, averaged), 10172.753811, 1.02e+04
Metric (3, 13): Max Latency (ms, averaged), 31446.709633, 3.14e+04
Metric (3, 14): Mean Row Read Latency (ms), 2.341761, 2.34
Metric (3, 15): Median Row Read Latency (ms, averaged), 2.084255, 2.08
Metric (3, 16): Max Row Read Latency (ms, averaged), 227.698803, 228
Metric (3, 17): Mean Total Read Latency (ms), 2.589468, 2.59
Metric (3, 18): Median Total Read Latency (ms, averaged), 2.157927, 2.16
Metric (3, 19): Max Total Latency (ms, averaged), 227.698803, 228
Metric (3, 20): Mean GRV Latency (ms), 6468.941918, 6.47e+03
Metric (3, 21): Median GRV Latency (ms, averaged), 8295.544624, 8.3e+03
Metric (3, 22): Max GRV Latency (ms, averaged), 10549.902201, 1.05e+04
Metric (3, 23): Mean Commit Latency (ms), 690.066671, 690
Metric (3, 24): Median Commit Latency (ms, averaged), 3.896475, 3.9
Metric (3, 25): Max Commit Latency (ms, averaged), 5891.766787, 5.89e+03
Metric (3, 26): Read rows/sec, 11407.560000, 1.14e+04
Metric (3, 27): Write rows/sec, 32542.933333, 3.25e+04
Metric (3, 28): Bytes read/sec, 866974.560000, 8.67e+05
Metric (3, 29): Bytes written/sec, 2473262.933333, 2.47e+06
4 test clients passed; 0 test clients failed

fdtop was:

ip               port    cpu%   mem%  iops     net    class        roles
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.28.174    4500    36     9     -        62     test
                  4501    17     7     -        25     test
                  4502    40     7     -        68     test
                  4503    36     7     -        61     test
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.32.74     4500    68     15    4864     204    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    5      3     -        1      stateless    cluster_controller
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.171    4500    100    14    28695    19     storage      storage
                  4501    100    16    28709    18     storage      storage
                  4502    0      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.172    4500    92     16    20848    17     storage      storage
                  4501    94     14    20849    18     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.34.155    4500    96     15    19710    20     storage      storage
                  4501    97     13    19712    17     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.37.131    4500    98     15    22552    18     storage      storage
                  4501    98     14    22622    22     storage      storage
                  4502    2      4     -        0      stateless
                  4503    13     3     -        2      stateless    master
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.85     4500    68     14    4922     211    log          log
                  4501    1      4     -        0      stateless
                  4502    1      2     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.42.96     4500    100    15    21927    17     storage      storage
                  4501    100    15    22122    17     storage      storage
                  4502    65     7     -        221    proxy        proxy
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.44.149    4500    100    14    19771    17     storage      storage
                  4501    100    15    19725    15     storage      storage
                  4502    58     7     -        204    proxy        proxy
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.46.120    4500    100    14    27309    16     storage      storage
                  4501    100    13    27309    17     storage      storage
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.47.4      4500    96     15    19951    16     storage      storage
                  4501    96     13    19912    15     storage      storage
                  4502    1      3     -        0      stateless
                  4503    76     11    -        57     stateless    resolver

Thank you


(Matt Lohier) #14

Yes I have now run it against your suggested test template.

fdbtop was (at about 40s into it):

ip               port    cpu%  mem%  iops    net   class        roles
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.28.174    4500    23    9     -       13    test
                  4501    23    7     -       13    test
                  4502    23    7     -       13    test
                  4503    22    7     -       13    test
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.32.74     4500    67    15    1713    69    log          log
                  4501    0     4     -       0     stateless
                  4502    0     3     -       0     stateless
                  4503    8     3     -       2     stateless    cluster_controller
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.33.171    4500    26    15    6189    5     storage      storage
                  4501    26    18    6188    5     storage      storage
                  4502    0     3     -       0     stateless
                  4503    1     3     -       0     stateless
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.33.172    4500    28    21    5892    5     storage      storage
                  4501    26    18    5842    5     storage      storage
                  4502    1     3     -       0     stateless
                  4503    1     2     -       0     stateless
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.34.155    4500    26    18    5987    5     storage      storage
                  4501    27    17    5985    5     storage      storage
                  4502    0     3     -       0     stateless
                  4503    1     3     -       0     stateless
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.37.131    4500    28    20    6097    6     storage      storage
                  4501    29    18    6295    6     storage      storage
                  4502    1     4     -       0     stateless
                  4503    13    3     -       2     stateless    master
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.39.85     4500    66    14    1707    69    log          log
                  4501    1     4     -       0     stateless
                  4502    1     2     -       0     stateless
                  4503    1     2     -       0     stateless
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.42.96     4500    27    19    6168    5     storage      storage
                  4501    29    19    6168    5     storage      storage
                  4502    56    7     -       66    proxy        proxy
                  4503    1     3     -       0     stateless
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.44.149    4500    29    18    5487    5     storage      storage
                  4501    28    20    5487    5     storage      storage
                  4502    55    7     -       65    proxy        proxy
                  4503    1     2     -       0     stateless
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.46.120    4500    26    16    6198    5     storage      storage
                  4501    25    15    6198    5     storage      storage
                  4502    0     3     -       0     stateless
                  4503    0     3     -       0     stateless
---------------  ------  ----  ----  ------  ----  -----------  --------------------
 172.31.47.4      4500    28    23    6176    5     storage      storage
                  4501    27    18    6180    5     storage      storage
                  4502    1     3     -       0     stateless
                  4503    25    11    -       17    stateless    resolver

and Test results:

setting up test (Benchmark)...
running test...
Benchmark complete
checking tests...
fetching metrics...
Metric (0, 0): Measured Duration, 45.000000, 45
Metric (0, 1): Transactions/sec, 1253.622222, 1.25e+03
Metric (0, 2): Operations/sec, 7521.733333, 7.52e+03
Metric (0, 3): A Transactions, 56413.000000, 56413
Metric (0, 4): B Transactions, 0.000000, 0
Metric (0, 5): Retries, 58.000000, 58
Metric (0, 6): Mean load time (seconds), 0.000000, 0
Metric (0, 7): Read rows, 56413.000000, 5.64e+04
Metric (0, 8): Write rows, 282065.000000, 2.82e+05
Metric (0, 9): Mean Latency (ms), 5.630629, 5.63
Metric (0, 10): Median Latency (ms, averaged), 5.510092, 5.51
Metric (0, 11): 90% Latency (ms, averaged), 6.629705, 6.63
Metric (0, 12): 98% Latency (ms, averaged), 7.832289, 7.83
Metric (0, 13): Max Latency (ms, averaged), 23.015976, 23
Metric (0, 14): Mean Row Read Latency (ms), 1.117079, 1.12
Metric (0, 15): Median Row Read Latency (ms, averaged), 1.091003, 1.09
Metric (0, 16): Max Row Read Latency (ms, averaged), 4.430771, 4.43
Metric (0, 17): Mean Total Read Latency (ms), 1.120961, 1.12
Metric (0, 18): Median Total Read Latency (ms, averaged), 1.091719, 1.09
Metric (0, 19): Max Total Latency (ms, averaged), 4.430771, 4.43
Metric (0, 20): Mean GRV Latency (ms), 1.900239, 1.9
Metric (0, 21): Median GRV Latency (ms, averaged), 1.916409, 1.92
Metric (0, 22): Max GRV Latency (ms, averaged), 4.681349, 4.68
Metric (0, 23): Mean Commit Latency (ms), 2.575268, 2.58
Metric (0, 24): Median Commit Latency (ms, averaged), 2.495289, 2.5
Metric (0, 25): Max Commit Latency (ms, averaged), 7.689476, 7.69
Metric (0, 26): Read rows/sec, 1253.622222, 1.25e+03
Metric (0, 27): Write rows/sec, 6268.111111, 6.27e+03
Metric (0, 28): Bytes read/sec, 140405.688889, 1.4e+05
Metric (0, 29): Bytes written/sec, 702028.444444, 7.02e+05
Metric (1, 0): Measured Duration, 45.000000, 45
Metric (1, 1): Transactions/sec, 1264.822222, 1.26e+03
Metric (1, 2): Operations/sec, 7588.933333, 7.59e+03
Metric (1, 3): A Transactions, 56917.000000, 56917
Metric (1, 4): B Transactions, 0.000000, 0
Metric (1, 5): Retries, 57.000000, 57
Metric (1, 6): Mean load time (seconds), 0.000000, 0
Metric (1, 7): Read rows, 56917.000000, 5.69e+04
Metric (1, 8): Write rows, 284585.000000, 2.85e+05
Metric (1, 9): Mean Latency (ms), 5.586799, 5.59
Metric (1, 10): Median Latency (ms, averaged), 5.463362, 5.46
Metric (1, 11): 90% Latency (ms, averaged), 6.573200, 6.57
Metric (1, 12): 98% Latency (ms, averaged), 7.766962, 7.77
Metric (1, 13): Max Latency (ms, averaged), 21.497726, 21.5
Metric (1, 14): Mean Row Read Latency (ms), 1.101701, 1.1
Metric (1, 15): Median Row Read Latency (ms, averaged), 1.089573, 1.09
Metric (1, 16): Max Row Read Latency (ms, averaged), 4.489183, 4.49
Metric (1, 17): Mean Total Read Latency (ms), 1.095027, 1.1
Metric (1, 18): Median Total Read Latency (ms, averaged), 1.088381, 1.09
Metric (1, 19): Max Total Latency (ms, averaged), 4.489183, 4.49
Metric (1, 20): Mean GRV Latency (ms), 1.900554, 1.9
Metric (1, 21): Median GRV Latency (ms, averaged), 1.914740, 1.91
Metric (1, 22): Max GRV Latency (ms, averaged), 6.859064, 6.86
Metric (1, 23): Mean Commit Latency (ms), 2.575362, 2.58
Metric (1, 24): Median Commit Latency (ms, averaged), 2.501488, 2.5
Metric (1, 25): Max Commit Latency (ms, averaged), 7.742405, 7.74
Metric (1, 26): Read rows/sec, 1264.822222, 1.26e+03
Metric (1, 27): Write rows/sec, 6324.111111, 6.32e+03
Metric (1, 28): Bytes read/sec, 141660.088889, 1.42e+05
Metric (1, 29): Bytes written/sec, 708300.444444, 7.08e+05
Metric (2, 0): Measured Duration, 45.000000, 45
Metric (2, 1): Transactions/sec, 1253.933333, 1.25e+03
Metric (2, 2): Operations/sec, 7523.600000, 7.52e+03
Metric (2, 3): A Transactions, 56427.000000, 56427
Metric (2, 4): B Transactions, 0.000000, 0
Metric (2, 5): Retries, 57.000000, 57
Metric (2, 6): Mean load time (seconds), 0.000000, 0
Metric (2, 7): Read rows, 56427.000000, 5.64e+04
Metric (2, 8): Write rows, 282135.000000, 2.82e+05
Metric (2, 9): Mean Latency (ms), 5.623883, 5.62
Metric (2, 10): Median Latency (ms, averaged), 5.500793, 5.5
Metric (2, 11): 90% Latency (ms, averaged), 6.594181, 6.59
Metric (2, 12): 98% Latency (ms, averaged), 7.816553, 7.82
Metric (2, 13): Max Latency (ms, averaged), 22.532225, 22.5
Metric (2, 14): Mean Row Read Latency (ms), 1.117904, 1.12
Metric (2, 15): Median Row Read Latency (ms, averaged), 1.090288, 1.09
Metric (2, 16): Max Row Read Latency (ms, averaged), 4.173517, 4.17
Metric (2, 17): Mean Total Read Latency (ms), 1.119981, 1.12
Metric (2, 18): Median Total Read Latency (ms, averaged), 1.089573, 1.09
Metric (2, 19): Max Total Latency (ms, averaged), 4.173517, 4.17
Metric (2, 20): Mean GRV Latency (ms), 1.906625, 1.91
Metric (2, 21): Median GRV Latency (ms, averaged), 1.928568, 1.93
Metric (2, 22): Max GRV Latency (ms, averaged), 5.538940, 5.54
Metric (2, 23): Mean Commit Latency (ms), 2.584481, 2.58
Metric (2, 24): Median Commit Latency (ms, averaged), 2.501249, 2.5
Metric (2, 25): Max Commit Latency (ms, averaged), 9.039164, 9.04
Metric (2, 26): Read rows/sec, 1253.933333, 1.25e+03
Metric (2, 27): Write rows/sec, 6269.666667, 6.27e+03
Metric (2, 28): Bytes read/sec, 140440.533333, 1.4e+05
Metric (2, 29): Bytes written/sec, 702202.666667, 7.02e+05
Metric (3, 0): Measured Duration, 45.000000, 45
Metric (3, 1): Transactions/sec, 1249.244444, 1.25e+03
Metric (3, 2): Operations/sec, 7495.466667, 7.5e+03
Metric (3, 3): A Transactions, 56216.000000, 56216
Metric (3, 4): B Transactions, 0.000000, 0
Metric (3, 5): Retries, 47.000000, 47
Metric (3, 6): Mean load time (seconds), 0.000000, 0
Metric (3, 7): Read rows, 56216.000000, 5.62e+04
Metric (3, 8): Write rows, 281080.000000, 2.81e+05
Metric (3, 9): Mean Latency (ms), 5.390076, 5.39
Metric (3, 10): Median Latency (ms, averaged), 5.277395, 5.28
Metric (3, 11): 90% Latency (ms, averaged), 6.374598, 6.37
Metric (3, 12): 98% Latency (ms, averaged), 7.591486, 7.59
Metric (3, 13): Max Latency (ms, averaged), 22.921324, 22.9
Metric (3, 14): Mean Row Read Latency (ms), 1.107188, 1.11
Metric (3, 15): Median Row Read Latency (ms, averaged), 1.093149, 1.09
Metric (3, 16): Max Row Read Latency (ms, averaged), 4.495144, 4.5
Metric (3, 17): Mean Total Read Latency (ms), 1.113639, 1.11
Metric (3, 18): Median Total Read Latency (ms, averaged), 1.094103, 1.09
Metric (3, 19): Max Total Latency (ms, averaged), 4.495144, 4.5
Metric (3, 20): Mean GRV Latency (ms), 1.785955, 1.79
Metric (3, 21): Median GRV Latency (ms, averaged), 1.752377, 1.75
Metric (3, 22): Max GRV Latency (ms, averaged), 5.010128, 5.01
Metric (3, 23): Mean Commit Latency (ms), 2.490831, 2.49
Metric (3, 24): Median Commit Latency (ms, averaged), 2.408504, 2.41
Metric (3, 25): Max Commit Latency (ms, averaged), 8.132219, 8.13
Metric (3, 26): Read rows/sec, 1249.244444, 1.25e+03
Metric (3, 27): Write rows/sec, 6246.222222, 6.25e+03
Metric (3, 28): Bytes read/sec, 139915.377778, 1.4e+05
Metric (3, 29): Bytes written/sec, 699576.888889, 7e+05
4 test clients passed; 0 test clients failed

How’s that profile do you think?


(Matt Lohier) #15

Maybe we were missing the desired transactionPerSecond in the config.
I re-ran the test with it set to 25,000 to simulate my first benchmark.

I got fdbtop:

ip               port    cpu%  mem%  iops     net    class        roles
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.28.174    4500    1     9     -        0      test
                  4501    1     7     -        0      test
                  4502    1     7     -        0      test
                  4503    1     7     -        0      test
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.32.74     4500    16    14    248      38     log          log
                  4501    0     4     -        0      stateless
                  4502    0     3     -        0      stateless
                  4503    9     4     -        4      stateless    cluster_controller
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.33.171    4500    51    15    22659    8      storage      storage
                  4501    50    18    22646    8      storage      storage
                  4502    0     3     -        0      stateless
                  4503    1     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.33.172    4500    66    21    22438    12     storage      storage
                  4501    61    18    22473    12     storage      storage
                  4502    1     3     -        0      stateless
                  4503    1     2     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.34.155    4500    76    18    22017    15     storage      storage
                  4501    77    17    21955    15     storage      storage
                  4502    0     3     -        0      stateless
                  4503    1     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.37.131    4500    48    20    22588    0      storage      storage
                  4501    47    18    22774    0      storage      storage
                  4502    1     4     -        0      stateless
                  4503    3     3     -        1      stateless    master
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.39.85     4500    59    14    1319     212    log          log
                  4501    1     4     -        0      stateless
                  4502    1     2     -        0      stateless
                  4503    1     2     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.42.96     4500    70    19    22932    13     storage      storage
                  4501    73    19    23075    14     storage      storage
                  4502    32    7     -        104    proxy        proxy
                  4503    1     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.44.149    4500    89    18    22136    18     storage      storage
                  4501    91    20    22136    18     storage      storage
                  4502    18    7     -        63     proxy        proxy
                  4503    1     2     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.46.120    4500    53    16    21428    7      storage      storage
                  4501    46    15    21428    7      storage      storage
                  4502    0     3     -        0      stateless
                  4503    0     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.47.4      4500    62    23    21985    6      storage      storage
                  4501    59    18    22290    6      storage      storage
                  4502    1     3     -        0      stateless
                  4503    22    11    -        22     stateless    resolver

And the results:

setting up test (Benchmark)...
running test...
Benchmark complete
 checking tests...
fetching metrics...
Metric (0, 0): Measured Duration, 45.000000, 45
Metric (0, 1): Transactions/sec, 6242.711111, 6.24e+03
Metric (0, 2): Operations/sec, 37456.266667, 3.75e+04
Metric (0, 3): A Transactions, 280922.000000, 280922
Metric (0, 4): B Transactions, 0.000000, 0
Metric (0, 5): Retries, 1998.000000, 1998
Metric (0, 6): Mean load time (seconds), 0.000000, 0
Metric (0, 7): Read rows, 280922.000000, 2.81e+05
Metric (0, 8): Write rows, 1404610.000000, 1.4e+06
Metric (0, 9): Mean Latency (ms), 7.967365, 7.97
Metric (0, 10): Median Latency (ms, averaged), 7.346630, 7.35
Metric (0, 11): 90% Latency (ms, averaged), 10.076523, 10.1
Metric (0, 12): 98% Latency (ms, averaged), 15.172482, 15.2
Metric (0, 13): Max Latency (ms, averaged), 73.229551, 73.2
Metric (0, 14): Mean Row Read Latency (ms), 1.663447, 1.66
Metric (0, 15): Median Row Read Latency (ms, averaged), 1.533747, 1.53
Metric (0, 16): Max Row Read Latency (ms, averaged), 57.014704, 57
Metric (0, 17): Mean Total Read Latency (ms), 1.680833, 1.68
Metric (0, 18): Median Total Read Latency (ms, averaged), 1.550913, 1.55
Metric (0, 19): Max Total Latency (ms, averaged), 57.014704, 57
Metric (0, 20): Mean GRV Latency (ms), 2.338461, 2.34
Metric (0, 21): Median GRV Latency (ms, averaged), 2.221346, 2.22
Metric (0, 22): Max GRV Latency (ms, averaged), 33.283710, 33.3
Metric (0, 23): Mean Commit Latency (ms), 3.864651, 3.86
Metric (0, 24): Median Commit Latency (ms, averaged), 3.451347, 3.45
Metric (0, 25): Max Commit Latency (ms, averaged), 35.688400, 35.7
Metric (0, 26): Read rows/sec, 6242.711111, 6.24e+03
Metric (0, 27): Write rows/sec, 31213.555556, 3.12e+04
Metric (0, 28): Bytes read/sec, 699183.644444, 6.99e+05
Metric (0, 29): Bytes written/sec, 3495918.222222, 3.5e+06
Metric (1, 0): Measured Duration, 45.000000, 45
Metric (1, 1): Transactions/sec, 6255.822222, 6.26e+03
Metric (1, 2): Operations/sec, 37534.933333, 3.75e+04
Metric (1, 3): A Transactions, 281512.000000, 281512
Metric (1, 4): B Transactions, 0.000000, 0
Metric (1, 5): Retries, 1946.000000, 1946
Metric (1, 6): Mean load time (seconds), 0.000000, 0
Metric (1, 7): Read rows, 281512.000000, 2.82e+05
Metric (1, 8): Write rows, 1407560.000000, 1.41e+06
Metric (1, 9): Mean Latency (ms), 7.951954, 7.95
Metric (1, 10): Median Latency (ms, averaged), 7.330418, 7.33
Metric (1, 11): 90% Latency (ms, averaged), 10.005474, 10
Metric (1, 12): 98% Latency (ms, averaged), 15.769005, 15.8
Metric (1, 13): Max Latency (ms, averaged), 73.498487, 73.5
Metric (1, 14): Mean Row Read Latency (ms), 1.663162, 1.66
Metric (1, 15): Median Row Read Latency (ms, averaged), 1.516104, 1.52
Metric (1, 16): Max Row Read Latency (ms, averaged), 54.116488, 54.1
Metric (1, 17): Mean Total Read Latency (ms), 1.666029, 1.67
Metric (1, 18): Median Total Read Latency (ms, averaged), 1.519442, 1.52
Metric (1, 19): Max Total Latency (ms, averaged), 54.116488, 54.1
Metric (1, 20): Mean GRV Latency (ms), 2.328478, 2.33
Metric (1, 21): Median GRV Latency (ms, averaged), 2.221823, 2.22
Metric (1, 22): Max GRV Latency (ms, averaged), 35.168171, 35.2
Metric (1, 23): Mean Commit Latency (ms), 3.825653, 3.83
Metric (1, 24): Median Commit Latency (ms, averaged), 3.444195, 3.44
Metric (1, 25): Max Commit Latency (ms, averaged), 36.324739, 36.3
Metric (1, 26): Read rows/sec, 6255.822222, 6.26e+03
Metric (1, 27): Write rows/sec, 31279.111111, 3.13e+04
Metric (1, 28): Bytes read/sec, 700652.088889, 7.01e+05
Metric (1, 29): Bytes written/sec, 3503260.444444, 3.5e+06
Metric (2, 0): Measured Duration, 45.000000, 45
Metric (2, 1): Transactions/sec, 6255.800000, 6.26e+03
Metric (2, 2): Operations/sec, 37534.800000, 3.75e+04
Metric (2, 3): A Transactions, 281511.000000, 281511
Metric (2, 4): B Transactions, 0.000000, 0
Metric (2, 5): Retries, 1996.000000, 1996
Metric (2, 6): Mean load time (seconds), 0.000000, 0
Metric (2, 7): Read rows, 281511.000000, 2.82e+05
Metric (2, 8): Write rows, 1407555.000000, 1.41e+06
Metric (2, 9): Mean Latency (ms), 7.991420, 7.99
Metric (2, 10): Median Latency (ms, averaged), 7.362604, 7.36
Metric (2, 11): 90% Latency (ms, averaged), 10.067701, 10.1
Metric (2, 12): 98% Latency (ms, averaged), 15.345097, 15.3
Metric (2, 13): Max Latency (ms, averaged), 100.161314, 100
Metric (2, 14): Mean Row Read Latency (ms), 1.669081, 1.67
Metric (2, 15): Median Row Read Latency (ms, averaged), 1.550436, 1.55
Metric (2, 16): Max Row Read Latency (ms, averaged), 44.190168, 44.2
Metric (2, 17): Mean Total Read Latency (ms), 1.675854, 1.68
Metric (2, 18): Median Total Read Latency (ms, averaged), 1.530170, 1.53
Metric (2, 19): Max Total Latency (ms, averaged), 44.190168, 44.2
Metric (2, 20): Mean GRV Latency (ms), 2.333680, 2.33
Metric (2, 21): Median GRV Latency (ms, averaged), 2.213717, 2.21
Metric (2, 22): Max GRV Latency (ms, averaged), 34.352541, 34.4
Metric (2, 23): Mean Commit Latency (ms), 3.851104, 3.85
Metric (2, 24): Median Commit Latency (ms, averaged), 3.465176, 3.47
Metric (2, 25): Max Commit Latency (ms, averaged), 36.120653, 36.1
Metric (2, 26): Read rows/sec, 6255.800000, 6.26e+03
Metric (2, 27): Write rows/sec, 31279.000000, 3.13e+04
Metric (2, 28): Bytes read/sec, 700649.600000, 7.01e+05
Metric (2, 29): Bytes written/sec, 3503248.000000, 3.5e+06
Metric (3, 0): Measured Duration, 45.000000, 45
Metric (3, 1): Transactions/sec, 6243.355556, 6.24e+03
Metric (3, 2): Operations/sec, 37460.133333, 3.75e+04
Metric (3, 3): A Transactions, 280951.000000, 280951
Metric (3, 4): B Transactions, 0.000000, 0
Metric (3, 5): Retries, 1958.000000, 1958
Metric (3, 6): Mean load time (seconds), 0.000000, 0
Metric (3, 7): Read rows, 280951.000000, 2.81e+05
Metric (3, 8): Write rows, 1404755.000000, 1.4e+06
Metric (3, 9): Mean Latency (ms), 7.767057, 7.77
Metric (3, 10): Median Latency (ms, averaged), 7.141829, 7.14
Metric (3, 11): 90% Latency (ms, averaged), 9.879827, 9.88
Metric (3, 12): 98% Latency (ms, averaged), 15.888929, 15.9
Metric (3, 13): Max Latency (ms, averaged), 72.538614, 72.5
Metric (3, 14): Mean Row Read Latency (ms), 1.670833, 1.67
Metric (3, 15): Median Row Read Latency (ms, averaged), 1.540422, 1.54
Metric (3, 16): Max Row Read Latency (ms, averaged), 48.386097, 48.4
Metric (3, 17): Mean Total Read Latency (ms), 1.669426, 1.67
Metric (3, 18): Median Total Read Latency (ms, averaged), 1.519442, 1.52
Metric (3, 19): Max Total Latency (ms, averaged), 48.386097, 48.4
Metric (3, 20): Mean GRV Latency (ms), 2.214704, 2.21
Metric (3, 21): Median GRV Latency (ms, averaged), 2.101183, 2.1
Metric (3, 22): Max GRV Latency (ms, averaged), 34.293652, 34.3
Metric (3, 23): Mean Commit Latency (ms), 3.767820, 3.77
Metric (3, 24): Median Commit Latency (ms, averaged), 3.358364, 3.36
Metric (3, 25): Max Commit Latency (ms, averaged), 35.957575, 36
Metric (3, 26): Read rows/sec, 6243.355556, 6.24e+03
Metric (3, 27): Write rows/sec, 31216.777778, 3.12e+04
Metric (3, 28): Bytes read/sec, 699255.822222, 6.99e+05
Metric (3, 29): Bytes written/sec, 3496279.111111, 3.5e+06
4 test clients passed; 0 test clients failed

So to analyse I need to add all 4 metrics (Metric (x, 1)) to check if I got the desired 25,000. If so it did achieved this, and with good latency it would appear.
In general I’m more CPU bound than disk/network bound it would appear. Would increasing the number of storage server to 3 per instance instead of 2 make sense eventually?

I’ll try it with desired transactionsPerSecond at 50K and expecting some saturation…


(Matt Lohier) #16

I repeated the test with testDuration=180 and transactionPerSecond=50000

fdbtop after 1min:

ip               port    cpu%   mem%  iops     net    class        roles
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.28.174    4500    76     9     -        132    test
                  4501    76     7     -        131    test
                  4502    75     7     -        131    test
                  4503    76     7     -        131    test
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.32.74     4500    38     14    481      365    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    6      4     -        2      stateless    cluster_controller
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.171    4500    87     15    33321    28     storage      storage
                  4501    84     18    33320    29     storage      storage
                  4502    0      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.172    4500    94     21    29255    28     storage      storage
                  4501    93     18    29255    28     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.34.155    4500    94     18    27242    27     storage      storage
                  4501    93     17    27240    27     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.37.131    4500    95     20    28942    30     storage      storage
                  4501    91     18    28862    30     storage      storage
                  4502    1      4     -        0      stateless
                  4503    6      3     -        1      stateless    master
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.85     4500    38     14    474      361    log          log
                  4501    1      4     -        0      stateless
                  4502    1      2     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.42.96     4500    100    19    28843    27     storage      storage
                  4501    100    19    28863    28     storage      storage
                  4502    82     7     -        487    proxy        proxy
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.44.149    4500    100    18    26499    27     storage      storage
                  4501    100    20    26496    28     storage      storage
                  4502    83     7     -        490    proxy        proxy
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.46.120    4500    86     16    28625    29     storage      storage
                  4501    90     15    30195    28     storage      storage
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.47.4      4500    96     23    23899    29     storage      storage
                  4501    94     18    23727    28     storage      storage
                  4502    1      3     -        0      stateless
                  4503    100    11    -        153    stateless    resolver

fdbtop after 2min:

ip               port    cpu%   mem%  iops     net    class        roles
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.28.174    4500    63     9     -        108    test
                  4501    64     7     -        108    test
                  4502    63     7     -        108    test
                  4503    63     7     -        108    test
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.32.74     4500    47     17    806      425    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    5      4     -        1      stateless    cluster_controller
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.171    4500    84     15    30781    25     storage      storage
                  4501    94     18    30768    31     storage      storage
                  4502    0      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.172    4500    95     21    28824    26     storage      storage
                  4501    93     18    28811    27     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.34.155    4500    91     18    25873    27     storage      storage
                  4501    95     18    25833    28     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.37.131    4500    93     20    27856    32     storage      storage
                  4501    93     18    28297    32     storage      storage
                  4502    1      4     -        0      stateless
                  4503    9      3     -        2      stateless    master
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.85     4500    59     16    1254     395    log          log
                  4501    1      4     -        0      stateless
                  4502    1      2     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.42.96     4500    100    19    26910    27     storage      storage
                  4501    100    19    26647    31     storage      storage
                  4502    80     7     -        370    proxy        proxy
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.44.149    4500    100    18    26541    29     storage      storage
                  4501    100    20    26331    29     storage      storage
                  4502    77     7     -        364    proxy        proxy
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.46.120    4500    96     16    35261    25     storage      storage
                  4501    92     15    34966    26     storage      storage
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.47.4      4500    92     24    25593    23     storage      storage
                  4501    94     19    25386    25     storage      storage
                  4502    1      3     -        0      stateless
                  4503    94     11    -        102    stateless    resolver

fdbtop after 3min:

ip               port    cpu%   mem%  iops     net    class        roles
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.28.174    4500    74     9     -        129    test
                  4501    74     7     -        129    test
                  4502    74     7     -        129    test
                  4503    73     7     -        129    test
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.32.74     4500    41     16    509      384    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    6      4     -        2      stateless    cluster_controller
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.171    4500    81     15    31890    29     storage      storage
                  4501    100    18    31888    32     storage      storage
                  4502    0      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.172    4500    90     21    23123    31     storage      storage
                  4501    93     18    23114    33     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.34.155    4500    94     19    22386    35     storage      storage
                  4501    94     18    22261    30     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.37.131    4500    96     20    25449    33     storage      storage
                  4501    93     18    24871    32     storage      storage
                  4502    1      4     -        0      stateless
                  4503    7      3     -        1      stateless    master
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.85     4500    42     16    519      394    log          log
                  4501    1      4     -        0      stateless
                  4502    1      2     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.42.96     4500    100    19    21652    32     storage      storage
                  4501    100    20    21664    35     storage      storage
                  4502    80     7     -        517    proxy        proxy
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.44.149    4500    100    18    24793    30     storage      storage
                  4501    100    20    24807    28     storage      storage
                  4502    81     7     -        514    proxy        proxy
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.46.120    4500    89     16    29725    31     storage      storage
                  4501    100    15    29766    32     storage      storage
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.47.4      4500    95     24    23891    29     storage      storage
                  4501    93     19    23894    30     storage      storage
                  4502    1      3     -        0      stateless
                  4503    100    11    -        170    stateless    resolver

Results:

setting up test (Benchmark)...
running test...
Benchmark complete
checking tests...
fetching metrics...
Metric (0, 0): Measured Duration, 135.000000, 135
Metric (0, 1): Transactions/sec, 7571.600000, 7.57e+03
Metric (0, 2): Operations/sec, 45429.600000, 4.54e+04
Metric (0, 3): A Transactions, 1022166.000000, 1022166
Metric (0, 4): B Transactions, 0.000000, 0
Metric (0, 5): Retries, 393975.000000, 393975
Metric (0, 6): Mean load time (seconds), 0.000000, 0
Metric (0, 7): Read rows, 1022166.000000, 1.02e+06
Metric (0, 8): Write rows, 5110830.000000, 5.11e+06
Metric (0, 9): Mean Latency (ms), 411.364913, 411
Metric (0, 10): Median Latency (ms, averaged), 284.663916, 285
Metric (0, 11): 90% Latency (ms, averaged), 769.979715, 770
Metric (0, 12): 98% Latency (ms, averaged), 1342.931747, 1.34e+03
Metric (0, 13): Max Latency (ms, averaged), 5961.887836, 5.96e+03
Metric (0, 14): Mean Row Read Latency (ms), 2.298455, 2.3
Metric (0, 15): Median Row Read Latency (ms, averaged), 2.046108, 2.05
Metric (0, 16): Max Row Read Latency (ms, averaged), 129.626989, 130
Metric (0, 17): Mean Total Read Latency (ms), 2.289309, 2.29
Metric (0, 18): Median Total Read Latency (ms, averaged), 2.046585, 2.05
Metric (0, 19): Max Total Latency (ms, averaged), 129.626989, 130
Metric (0, 20): Mean GRV Latency (ms), 110.595340, 111
Metric (0, 21): Median GRV Latency (ms, averaged), 3.789663, 3.79
Metric (0, 22): Max GRV Latency (ms, averaged), 1614.577532, 1.61e+03
Metric (0, 23): Mean Commit Latency (ms), 173.129504, 173
Metric (0, 24): Median Commit Latency (ms, averaged), 208.876848, 209
Metric (0, 25): Max Commit Latency (ms, averaged), 329.158545, 329
Metric (0, 26): Read rows/sec, 7571.600000, 7.57e+03
Metric (0, 27): Write rows/sec, 37858.000000, 3.79e+04
Metric (0, 28): Bytes read/sec, 848019.200000, 8.48e+05
Metric (0, 29): Bytes written/sec, 4240096.000000, 4.24e+06
Metric (1, 0): Measured Duration, 135.000000, 135
Metric (1, 1): Transactions/sec, 7578.688889, 7.58e+03
Metric (1, 2): Operations/sec, 45472.133333, 4.55e+04
Metric (1, 3): A Transactions, 1023123.000000, 1023123
Metric (1, 4): B Transactions, 0.000000, 0
Metric (1, 5): Retries, 393660.000000, 393660
Metric (1, 6): Mean load time (seconds), 0.000000, 0
Metric (1, 7): Read rows, 1023123.000000, 1.02e+06
Metric (1, 8): Write rows, 5115615.000000, 5.12e+06
Metric (1, 9): Mean Latency (ms), 409.823334, 410
Metric (1, 10): Median Latency (ms, averaged), 284.409046, 284
Metric (1, 11): 90% Latency (ms, averaged), 753.310442, 753
Metric (1, 12): 98% Latency (ms, averaged), 1338.480473, 1.34e+03
Metric (1, 13): Max Latency (ms, averaged), 6425.745726, 6.43e+03
Metric (1, 14): Mean Row Read Latency (ms), 2.310091, 2.31
Metric (1, 15): Median Row Read Latency (ms, averaged), 2.046824, 2.05
Metric (1, 16): Max Row Read Latency (ms, averaged), 144.459248, 144
Metric (1, 17): Mean Total Read Latency (ms), 2.285319, 2.29
Metric (1, 18): Median Total Read Latency (ms, averaged), 2.031326, 2.03
Metric (1, 19): Max Total Latency (ms, averaged), 144.459248, 144
Metric (1, 20): Mean GRV Latency (ms), 107.902974, 108
Metric (1, 21): Median GRV Latency (ms, averaged), 3.774166, 3.77
Metric (1, 22): Max GRV Latency (ms, averaged), 1615.671635, 1.62e+03
Metric (1, 23): Mean Commit Latency (ms), 174.703986, 175
Metric (1, 24): Median Commit Latency (ms, averaged), 209.763288, 210
Metric (1, 25): Max Commit Latency (ms, averaged), 329.157352, 329
Metric (1, 26): Read rows/sec, 7578.688889, 7.58e+03
Metric (1, 27): Write rows/sec, 37893.444444, 3.79e+04
Metric (1, 28): Bytes read/sec, 848813.155556, 8.49e+05
Metric (1, 29): Bytes written/sec, 4244065.777778, 4.24e+06
Metric (2, 0): Measured Duration, 135.000000, 135
Metric (2, 1): Transactions/sec, 7574.429630, 7.57e+03
Metric (2, 2): Operations/sec, 45446.577778, 4.54e+04
Metric (2, 3): A Transactions, 1022548.000000, 1022548
Metric (2, 4): B Transactions, 0.000000, 0
Metric (2, 5): Retries, 394220.000000, 394220
Metric (2, 6): Mean load time (seconds), 0.000000, 0
Metric (2, 7): Read rows, 1022548.000000, 1.02e+06
Metric (2, 8): Write rows, 5112740.000000, 5.11e+06
Metric (2, 9): Mean Latency (ms), 415.561454, 416
Metric (2, 10): Median Latency (ms, averaged), 289.265633, 289
Metric (2, 11): 90% Latency (ms, averaged), 770.749092, 771
Metric (2, 12): 98% Latency (ms, averaged), 1361.495972, 1.36e+03
Metric (2, 13): Max Latency (ms, averaged), 5571.991444, 5.57e+03
Metric (2, 14): Mean Row Read Latency (ms), 2.318312, 2.32
Metric (2, 15): Median Row Read Latency (ms, averaged), 2.044439, 2.04
Metric (2, 16): Max Row Read Latency (ms, averaged), 148.941040, 149
Metric (2, 17): Mean Total Read Latency (ms), 2.327322, 2.33
Metric (2, 18): Median Total Read Latency (ms, averaged), 2.051353, 2.05
Metric (2, 19): Max Total Latency (ms, averaged), 148.941040, 149
Metric (2, 20): Mean GRV Latency (ms), 106.984447, 107
Metric (2, 21): Median GRV Latency (ms, averaged), 3.784895, 3.78
Metric (2, 22): Max GRV Latency (ms, averaged), 1614.240170, 1.61e+03
Metric (2, 23): Mean Commit Latency (ms), 173.040868, 173
Metric (2, 24): Median Commit Latency (ms, averaged), 209.065676, 209
Metric (2, 25): Max Commit Latency (ms, averaged), 329.387188, 329
Metric (2, 26): Read rows/sec, 7574.429630, 7.57e+03
Metric (2, 27): Write rows/sec, 37872.148148, 3.79e+04
Metric (2, 28): Bytes read/sec, 848336.118519, 8.48e+05
Metric (2, 29): Bytes written/sec, 4241680.592593, 4.24e+06
Metric (3, 0): Measured Duration, 135.000000, 135
Metric (3, 1): Transactions/sec, 7583.985185, 7.58e+03
Metric (3, 2): Operations/sec, 45503.911111, 4.55e+04
Metric (3, 3): A Transactions, 1023838.000000, 1023838
Metric (3, 4): B Transactions, 0.000000, 0
Metric (3, 5): Retries, 394136.000000, 394136
Metric (3, 6): Mean load time (seconds), 0.000000, 0
Metric (3, 7): Read rows, 1023838.000000, 1.02e+06
Metric (3, 8): Write rows, 5119190.000000, 5.12e+06
Metric (3, 9): Mean Latency (ms), 410.569918, 411
Metric (3, 10): Median Latency (ms, averaged), 285.452843, 285
Metric (3, 11): 90% Latency (ms, averaged), 752.893209, 753
Metric (3, 12): 98% Latency (ms, averaged), 1341.145992, 1.34e+03
Metric (3, 13): Max Latency (ms, averaged), 6447.235346, 6.45e+03
Metric (3, 14): Mean Row Read Latency (ms), 2.274483, 2.27
Metric (3, 15): Median Row Read Latency (ms, averaged), 2.050161, 2.05
Metric (3, 16): Max Row Read Latency (ms, averaged), 140.504599, 141
Metric (3, 17): Mean Total Read Latency (ms), 2.295778, 2.3
Metric (3, 18): Median Total Read Latency (ms, averaged), 2.043724, 2.04
Metric (3, 19): Max Total Latency (ms, averaged), 140.504599, 141
Metric (3, 20): Mean GRV Latency (ms), 106.472686, 106
Metric (3, 21): Median GRV Latency (ms, averaged), 3.704548, 3.7
Metric (3, 22): Max GRV Latency (ms, averaged), 1613.441706, 1.61e+03
Metric (3, 23): Mean Commit Latency (ms), 174.114114, 174
Metric (3, 24): Median Commit Latency (ms, averaged), 209.000826, 209
Metric (3, 25): Max Commit Latency (ms, averaged), 329.356670, 329
Metric (3, 26): Read rows/sec, 7583.985185, 7.58e+03
Metric (3, 27): Write rows/sec, 37919.925926, 3.79e+04
Metric (3, 28): Bytes read/sec, 849406.340741, 8.49e+05
Metric (3, 29): Bytes written/sec, 4247031.703704, 4.25e+06
4 test clients passed; 0 test clients failed

It would appear it was only able to achieve ~30,000 tps
Maybe the tester was a bit saturated and I can have a couple more running test role. But more likely I’m hitting a CPU limit on the storage processes. Too many stateless processes not being utilised?

Anything I should tweak in the environment or cluster?

Thank you! / Matt


(Matt Lohier) #17

I added 10 more machines to get to a cluster of 20machines. I recruited 2 more logs, 2 more proxies (on different instances than log’s) and re-configure. I ran the same last benchmark (3min, 50,000 tps)

fdbtop after 1min

ip               port    cpu%   mem%  iops     net    class        roles
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.28.174    4500    84     4     -        137    test
                  4501    84     3     -        137    test
                  4502    81     3     -        140    test
                  4503    83     3     -        137    test
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.32.157    4500    31     5     8385     13     storage      storage
                  4501    28     4     8116     9      storage      storage
                  4502    0      3     -        0      stateless
                  4503    8      3     -        2      stateless    master
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.32.74     4500    43     13    520      226    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    16     4     -        8      stateless    cluster_controller
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.171    4500    66     15    19042    22     storage      storage
                  4501    70     18    19043    21     storage      storage
                  4502    0      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.172    4500    74     21    21231    25     storage      storage
                  4501    71     18    21268    21     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.34.155    4500    76     19    19216    28     storage      storage
                  4501    67     18    19215    18     storage      storage
                  4502    0      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.35.133    4500    38     6     9677     9      storage      storage
                  4501    38     5     9636     9      storage      storage
                  4502    61     3     -        242    proxy        proxy
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.36.35     4500    38     6     514      181    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.37.131    4500    78     20    21436    25     storage      storage
                  4501    71     18    21326    20     storage      storage
                  4502    2      5     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.37.98     4500    39     7     8186     8      storage      storage
                  4501    38     4     8198     9      storage      storage
                  4502    61     3     -        258    proxy        proxy
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.38.195    4500    27     6     7933     9      storage      storage
                  4501    26     4     7934     11     storage      storage
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.38.34     4500    40     7     507      204    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.157    4500    28     6     8242     9      storage      storage
                  4501    31     4     8246     13     storage      storage
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.184    4500    28     6     8895     9      storage      storage
                  4501    30     5     8895     9      storage      storage
                  4502    0      2     -        0      stateless
                  4503    0      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.85     4500    37     14    469      160    log          log
                  4501    1      4     -        0      stateless
                  4502    1      2     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.40.18     4500    27     7     9164     11     storage      storage
                  4501    32     5     9148     9      storage      storage
                  4502    0      3     -        0      stateless

… output truncated.

fdbtop after 2min:

ip               port    cpu%   mem%  iops     net    class        roles
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.28.174    4500    88     4     -        146    test
                  4501    85     3     -        148    test
                  4502    84     3     -        146    test
                  4503    88     3     -        144    test
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.32.157    4500    42     6     12386    13     storage      storage
                  4501    41     5     12386    13     storage      storage
                  4502    0      3     -        0      stateless
                  4503    9      3     -        2      stateless    master
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.32.74     4500    43     13    541      230    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    11     4     -        3      stateless    cluster_controller
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.171    4500    62     15    17782    20     storage      storage
                  4501    70     18    18003    20     storage      storage
                  4502    0      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.33.172    4500    69     21    19883    22     storage      storage
                  4501    69     18    19971    21     storage      storage
                  4502    1      3     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.34.155    4500    67     19    17726    19     storage      storage
                  4501    69     18    17697    30     storage      storage
                  4502    0      3     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.35.133    4500    56     6     11992    13     storage      storage
                  4501    51     5     11991    12     storage      storage
                  4502    61     3     -        274    proxy        proxy
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.36.35     4500    40     6     561      205    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.37.131    4500    75     20    20259    26     storage      storage
                  4501    70     18    20699    22     storage      storage
                  4502    2      5     -        0      stateless
                  4503    1      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.37.98     4500    50     7     9833     12     storage      storage
                  4501    53     5     9823     12     storage      storage
                  4502    64     3     -        278    proxy        proxy
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.38.195    4500    41     7     10949    17     storage      storage
                  4501    33     5     10905    11     storage      storage
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.38.34     4500    43     7     561      229    log          log
                  4501    0      4     -        0      stateless
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.157    4500    47     7     11537    17     storage      storage
                  4501    41     5     11537    12     storage      storage
                  4502    0      3     -        0      stateless
                  4503    0      3     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.184    4500    46     6     11722    18     storage      storage
                  4501    45     5     11715    13     storage      storage
                  4502    0      2     -        0      stateless
                  4503    0      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.39.85     4500    39     14    540      190    log          log
                  4501    1      4     -        0      stateless
                  4502    1      2     -        0      stateless
                  4503    1      2     -        0      stateless
---------------  ------  -----  ----  -------  -----  -----------  --------------------
 172.31.40.18     4500    37     7     10955    11     storage      storage
                  4501    41     5     10972    12     storage      storage
                  4502    0      3     -        0      stateless

Results:

Metric (0, 0): Measured Duration, 135.000000, 135
Metric (0, 1): Transactions/sec, 9438.244444, 9.44e+03
Metric (0, 2): Operations/sec, 56629.466667, 5.66e+04
Metric (0, 3): A Transactions, 1274163.000000, 1274163
Metric (0, 4): B Transactions, 0.000000, 0
Metric (0, 5): Retries, 629072.000000, 629072
Metric (0, 6): Mean load time (seconds), 0.000000, 0
Metric (0, 7): Read rows, 1274163.000000, 1.27e+06
Metric (0, 8): Write rows, 6370815.000000, 6.37e+06
Metric (0, 9): Mean Latency (ms), 330.487994, 330
Metric (0, 10): Median Latency (ms, averaged), 227.494717, 227
Metric (0, 11): 90% Latency (ms, averaged), 626.070738, 626
Metric (0, 12): 98% Latency (ms, averaged), 937.789679, 938
Metric (0, 13): Max Latency (ms, averaged), 7886.324644, 7.89e+03
Metric (0, 14): Mean Row Read Latency (ms), 5.396409, 5.4
Metric (0, 15): Median Row Read Latency (ms, averaged), 5.198002, 5.2
Metric (0, 16): Max Row Read Latency (ms, averaged), 225.377083, 225
Metric (0, 17): Mean Total Read Latency (ms), 5.365328, 5.37
Metric (0, 18): Median Total Read Latency (ms, averaged), 5.204201, 5.2
Metric (0, 19): Max Total Latency (ms, averaged), 225.377083, 225
Metric (0, 20): Mean GRV Latency (ms), 12.315299, 12.3
Metric (0, 21): Median GRV Latency (ms, averaged), 7.256031, 7.26
Metric (0, 22): Max GRV Latency (ms, averaged), 75.836897, 75.8
Metric (0, 23): Mean Commit Latency (ms), 198.458871, 198
Metric (0, 24): Median Commit Latency (ms, averaged), 194.267511, 194
Metric (0, 25): Max Commit Latency (ms, averaged), 346.850872, 347
Metric (0, 26): Read rows/sec, 9438.244444, 9.44e+03
Metric (0, 27): Write rows/sec, 47191.222222, 4.72e+04
Metric (0, 28): Bytes read/sec, 1057083.377778, 1.06e+06
Metric (0, 29): Bytes written/sec, 5285416.888889, 5.29e+06
Metric (1, 0): Measured Duration, 135.000000, 135
Metric (1, 1): Transactions/sec, 9581.481481, 9.58e+03
Metric (1, 2): Operations/sec, 57488.888889, 5.75e+04
Metric (1, 3): A Transactions, 1293500.000000, 1293500
Metric (1, 4): B Transactions, 0.000000, 0
Metric (1, 5): Retries, 630868.000000, 630868
Metric (1, 6): Mean load time (seconds), 0.000000, 0
Metric (1, 7): Read rows, 1293500.000000, 1.29e+06
Metric (1, 8): Write rows, 6467500.000000, 6.47e+06
Metric (1, 9): Mean Latency (ms), 324.933938, 325
Metric (1, 10): Median Latency (ms, averaged), 225.060225, 225
Metric (1, 11): 90% Latency (ms, averaged), 593.861818, 594
Metric (1, 12): 98% Latency (ms, averaged), 914.686441, 915
Metric (1, 13): Max Latency (ms, averaged), 5568.188667, 5.57e+03
Metric (1, 14): Mean Row Read Latency (ms), 5.603613, 5.6
Metric (1, 15): Median Row Read Latency (ms, averaged), 5.508661, 5.51
Metric (1, 16): Max Row Read Latency (ms, averaged), 222.781420, 223
Metric (1, 17): Mean Total Read Latency (ms), 5.617617, 5.62
Metric (1, 18): Median Total Read Latency (ms, averaged), 5.539656, 5.54
Metric (1, 19): Max Total Latency (ms, averaged), 222.781420, 223
Metric (1, 20): Mean GRV Latency (ms), 10.278391, 10.3
Metric (1, 21): Median GRV Latency (ms, averaged), 7.098675, 7.1
Metric (1, 22): Max GRV Latency (ms, averaged), 72.761536, 72.8
Metric (1, 23): Mean Commit Latency (ms), 198.146578, 198
Metric (1, 24): Median Commit Latency (ms, averaged), 193.711519, 194
Metric (1, 25): Max Commit Latency (ms, averaged), 347.121239, 347
Metric (1, 26): Read rows/sec, 9581.481481, 9.58e+03
Metric (1, 27): Write rows/sec, 47907.407407, 4.79e+04
Metric (1, 28): Bytes read/sec, 1073125.925926, 1.07e+06
Metric (1, 29): Bytes written/sec, 5365629.629630, 5.37e+06
Metric (2, 0): Measured Duration, 135.000000, 135
Metric (2, 1): Transactions/sec, 9646.851852, 9.65e+03
Metric (2, 2): Operations/sec, 57881.111111, 5.79e+04
Metric (2, 3): A Transactions, 1302325.000000, 1302325
Metric (2, 4): B Transactions, 0.000000, 0
Metric (2, 5): Retries, 631303.000000, 631303
Metric (2, 6): Mean load time (seconds), 0.000000, 0
Metric (2, 7): Read rows, 1302325.000000, 1.3e+06
Metric (2, 8): Write rows, 6511625.000000, 6.51e+06
Metric (2, 9): Mean Latency (ms), 328.639108, 329
Metric (2, 10): Median Latency (ms, averaged), 225.909948, 226
Metric (2, 11): 90% Latency (ms, averaged), 606.352806, 606
Metric (2, 12): 98% Latency (ms, averaged), 932.868004, 933
Metric (2, 13): Max Latency (ms, averaged), 8732.840300, 8.73e+03
Metric (2, 14): Mean Row Read Latency (ms), 5.346288, 5.35
Metric (2, 15): Median Row Read Latency (ms, averaged), 5.135536, 5.14
Metric (2, 16): Max Row Read Latency (ms, averaged), 225.169659, 225
Metric (2, 17): Mean Total Read Latency (ms), 5.301099, 5.3
Metric (2, 18): Median Total Read Latency (ms, averaged), 5.125999, 5.13
Metric (2, 19): Max Total Latency (ms, averaged), 225.169659, 225
Metric (2, 20): Mean GRV Latency (ms), 10.234027, 10.2
Metric (2, 21): Median GRV Latency (ms, averaged), 6.716967, 6.72
Metric (2, 22): Max GRV Latency (ms, averaged), 85.571527, 85.6
Metric (2, 23): Mean Commit Latency (ms), 198.373205, 198
Metric (2, 24): Median Commit Latency (ms, averaged), 193.957567, 194
Metric (2, 25): Max Commit Latency (ms, averaged), 346.805573, 347
Metric (2, 26): Read rows/sec, 9646.851852, 9.65e+03
Metric (2, 27): Write rows/sec, 48234.259259, 4.82e+04
Metric (2, 28): Bytes read/sec, 1080447.407407, 1.08e+06
Metric (2, 29): Bytes written/sec, 5402237.037037, 5.4e+06
Metric (3, 0): Measured Duration, 135.000000, 135
Metric (3, 1): Transactions/sec, 9441.125926, 9.44e+03
Metric (3, 2): Operations/sec, 56646.755556, 5.66e+04
Metric (3, 3): A Transactions, 1274552.000000, 1274552
Metric (3, 4): B Transactions, 0.000000, 0
Metric (3, 5): Retries, 627211.000000, 627211
Metric (3, 6): Mean load time (seconds), 0.000000, 0
Metric (3, 7): Read rows, 1274552.000000, 1.27e+06
Metric (3, 8): Write rows, 6372760.000000, 6.37e+06
Metric (3, 9): Mean Latency (ms), 329.680598, 330
Metric (3, 10): Median Latency (ms, averaged), 226.954460, 227
Metric (3, 11): 90% Latency (ms, averaged), 618.351698, 618
Metric (3, 12): 98% Latency (ms, averaged), 937.458992, 937
Metric (3, 13): Max Latency (ms, averaged), 6648.386240, 6.65e+03
Metric (3, 14): Mean Row Read Latency (ms), 5.419694, 5.42
Metric (3, 15): Median Row Read Latency (ms, averaged), 5.286217, 5.29
Metric (3, 16): Max Row Read Latency (ms, averaged), 203.564405, 204
Metric (3, 17): Mean Total Read Latency (ms), 5.471550, 5.47
Metric (3, 18): Median Total Read Latency (ms, averaged), 5.340815, 5.34
Metric (3, 19): Max Total Latency (ms, averaged), 203.564405, 204
Metric (3, 20): Mean GRV Latency (ms), 12.228328, 12.2
Metric (3, 21): Median GRV Latency (ms, averaged), 7.127523, 7.13
Metric (3, 22): Max GRV Latency (ms, averaged), 76.096535, 76.1
Metric (3, 23): Mean Commit Latency (ms), 199.252341, 199
Metric (3, 24): Median Commit Latency (ms, averaged), 194.741488, 195
Metric (3, 25): Max Commit Latency (ms, averaged), 346.451521, 346
Metric (3, 26): Read rows/sec, 9441.125926, 9.44e+03
Metric (3, 27): Write rows/sec, 47205.629630, 4.72e+04
Metric (3, 28): Bytes read/sec, 1057406.103704, 1.06e+06
Metric (3, 29): Bytes written/sec, 5287030.518519, 5.29e+06
4 test clients passed; 0 test clients failed

I got to about 40000tps, so slightly more than with 10 instances but exactly the same latency as with 10 instances: average commit latency at about 200ms.

I’m going to try increasing the number of resolvers…


(Matt Lohier) #18

I configured the cluster with resolvers=4 and re-run the same fdbserver test (3min, 50,000tps)

fdbtop:

ip               port    cpu%  mem%  iops     net    class        roles
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.28.174    4500    60    4     -        92     test
                  4501    60    3     -        93     test
                  4502    59    3     -        93     test
                  4503    59    3     -        93     test
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.32.157    4500    69    7     20194    21     storage      storage
                  4501    74    6     20211    19     storage      storage
                  4502    0     3     -        0      stateless
                  4503    1     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.32.74     4500    66    13    1542     210    log          log
                  4501    0     4     -        0      stateless
                  4502    0     3     -        0      stateless
                  4503    16    4     -        10     stateless    cluster_controller
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.33.171    4500    67    15    16682    19     storage      storage
                  4501    70    18    16682    19     storage      storage
                  4502    0     3     -        0      stateless
                  4503    1     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.33.172    4500    76    21    17100    22     storage      storage
                  4501    72    18    17124    22     storage      storage
                  4502    1     3     -        0      stateless
                  4503    1     2     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.34.155    4500    50    19    15794    5      storage      storage
                  4501    49    18    15793    5      storage      storage
                  4502    0     3     -        0      stateless
                  4503    1     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.35.133    4500    89    10    23528    10     storage      storage
                  4501    70    7     23528    7      storage      storage
                  4502    46    3     -        152    proxy        proxy
                  4503    0     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.36.35     4500    88    9     2529     283    log          log
                  4501    0     4     -        0      stateless
                  4502    0     3     -        0      stateless
                  4503    0     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.37.131    4500    59    20    16143    10     storage      storage
                  4501    52    18    16078    12     storage      storage
                  4502    2     5     -        0      stateless
                  4503    1     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.37.98     4500    91    9     16941    20     storage      storage
                  4501    80    6     16932    15     storage      storage
                  4502    80    3     -        265    proxy        proxy
                  4503    0     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.38.195    4500    76    8     19611    21     storage      storage
                  4501    71    6     19624    20     storage      storage
                  4502    0     3     -        0      stateless
                  4503    0     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.38.34     4500    94    9     4279     283    log          log
                  4501    55    5     -        53     stateless    resolver
                  4502    0     3     -        0      stateless
                  4503    0     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.39.157    4500    78    8     17692    13     storage      storage
                  4501    77    6     17701    13     storage      storage
                  4502    21    3     -        4      stateless    master
                  4503    41    3     -        29     stateless    resolver
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.39.184    4500    82    8     19657    21     storage      storage
                  4501    77    6     19785    21     storage      storage
                  4502    0     2     -        0      stateless
                  4503    0     2     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.39.85     4500    37    10    670      71     log          log
                  4501    1     4     -        0      stateless
                  4502    1     2     -        0      stateless
                  4503    1     2     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.40.18     4500    58    8     18546    11     storage      storage
                  4501    73    6     18554    12     storage      storage
                  4502    0     3     -        0      stateless
                  4503    28    3     -        26     stateless    resolver
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.42.96     4500    43    19    16580    4      storage      storage
                  4501    48    20    16589    4      storage      storage
                  4502    77    7     -        271    proxy        proxy
                  4503    1     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.44.149    4500    55    18    17122    10     storage      storage
                  4501    64    20    17122    10     storage      storage
                  4502    69    7     -        227    proxy        proxy
                  4503    1     2     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.46.120    4500    70    16    16762    17     storage      storage
                  4501    82    15    16785    21     storage      storage
                  4502    46    3     -        40     stateless    resolver
                  4503    0     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.47.158    4500    74    7     17540    15     storage      storage
                  4501    64    6     17535    16     storage      storage
                  4502    0     3     -        0      stateless
                  4503    0     3     -        0      stateless
---------------  ------  ----  ----  -------  -----  -----------  --------------------
 172.31.47.4      4500    61    24    14587    17     storage      storage
                  4501    65    19    14551    17     storage      storage
                  4502    1     3     -        0      stateless
                  4503    2     12    -        0      stateless

Results:

setting up test (Benchmark)...
running test...
Benchmark complete
checking tests...
fetching metrics...
Metric (0, 0): Measured Duration, 135.000000, 135
Metric (0, 1): Transactions/sec, 12494.688889, 1.25e+04
Metric (0, 2): Operations/sec, 74968.133333, 7.5e+04
Metric (0, 3): A Transactions, 1686783.000000, 1686783
Metric (0, 4): B Transactions, 0.000000, 0
Metric (0, 5): Retries, 70932.000000, 70932
Metric (0, 6): Mean load time (seconds), 0.000000, 0
Metric (0, 7): Read rows, 1686783.000000, 1.69e+06
Metric (0, 8): Write rows, 8433915.000000, 8.43e+06
Metric (0, 9): Mean Latency (ms), 24.433103, 24.4
Metric (0, 10): Median Latency (ms, averaged), 22.470236, 22.5
Metric (0, 11): 90% Latency (ms, averaged), 30.676603, 30.7
Metric (0, 12): 98% Latency (ms, averaged), 53.017616, 53
Metric (0, 13): Max Latency (ms, averaged), 228.917837, 229
Metric (0, 14): Mean Row Read Latency (ms), 5.856853, 5.86
Metric (0, 15): Median Row Read Latency (ms, averaged), 5.380630, 5.38
Metric (0, 16): Max Row Read Latency (ms, averaged), 187.579632, 188
Metric (0, 17): Mean Total Read Latency (ms), 5.803291, 5.8
Metric (0, 18): Median Total Read Latency (ms, averaged), 5.342722, 5.34
Metric (0, 19): Max Total Latency (ms, averaged), 187.579632, 188
Metric (0, 20): Mean GRV Latency (ms), 7.503022, 7.5
Metric (0, 21): Median GRV Latency (ms, averaged), 7.063389, 7.06
Metric (0, 22): Max GRV Latency (ms, averaged), 35.660267, 35.7
Metric (0, 23): Mean Commit Latency (ms), 9.893580, 9.89
Metric (0, 24): Median Commit Latency (ms, averaged), 9.062767, 9.06
Metric (0, 25): Max Commit Latency (ms, averaged), 55.891752, 55.9
Metric (0, 26): Read rows/sec, 12494.688889, 1.25e+04
Metric (0, 27): Write rows/sec, 62473.444444, 6.25e+04
Metric (0, 28): Bytes read/sec, 1399405.155556, 1.4e+06
Metric (0, 29): Bytes written/sec, 6997025.777778, 7e+06
Metric (1, 0): Measured Duration, 135.000000, 135
Metric (1, 1): Transactions/sec, 12491.244444, 1.25e+04
Metric (1, 2): Operations/sec, 74947.466667, 7.49e+04
Metric (1, 3): A Transactions, 1686318.000000, 1686318
Metric (1, 4): B Transactions, 0.000000, 0
Metric (1, 5): Retries, 73017.000000, 73017
Metric (1, 6): Mean load time (seconds), 0.000000, 0
Metric (1, 7): Read rows, 1686318.000000, 1.69e+06
Metric (1, 8): Write rows, 8431590.000000, 8.43e+06
Metric (1, 9): Mean Latency (ms), 25.181469, 25.2
Metric (1, 10): Median Latency (ms, averaged), 23.118734, 23.1
Metric (1, 11): 90% Latency (ms, averaged), 31.497478, 31.5
Metric (1, 12): 98% Latency (ms, averaged), 54.416656, 54.4
Metric (1, 13): Max Latency (ms, averaged), 315.679073, 316
Metric (1, 14): Mean Row Read Latency (ms), 6.087186, 6.09
Metric (1, 15): Median Row Read Latency (ms, averaged), 5.635738, 5.64
Metric (1, 16): Max Row Read Latency (ms, averaged), 101.420164, 101
Metric (1, 17): Mean Total Read Latency (ms), 6.057966, 6.06
Metric (1, 18): Median Total Read Latency (ms, averaged), 5.623817, 5.62
Metric (1, 19): Max Total Latency (ms, averaged), 101.420164, 101
Metric (1, 20): Mean GRV Latency (ms), 7.647929, 7.65
Metric (1, 21): Median GRV Latency (ms, averaged), 7.230759, 7.23
Metric (1, 22): Max GRV Latency (ms, averaged), 33.132792, 33.1
Metric (1, 23): Mean Commit Latency (ms), 10.048114, 10
Metric (1, 24): Median Commit Latency (ms, averaged), 9.285212, 9.29
Metric (1, 25): Max Commit Latency (ms, averaged), 49.364805, 49.4
Metric (1, 26): Read rows/sec, 12491.244444, 1.25e+04
Metric (1, 27): Write rows/sec, 62456.222222, 6.25e+04
Metric (1, 28): Bytes read/sec, 1399019.377778, 1.4e+06
Metric (1, 29): Bytes written/sec, 6995096.888889, 7e+06
Metric (2, 0): Measured Duration, 135.000000, 135
Metric (2, 1): Transactions/sec, 12489.896296, 1.25e+04
Metric (2, 2): Operations/sec, 74939.377778, 7.49e+04
Metric (2, 3): A Transactions, 1686136.000000, 1686136
Metric (2, 4): B Transactions, 0.000000, 0
Metric (2, 5): Retries, 70142.000000, 70142
Metric (2, 6): Mean load time (seconds), 0.000000, 0
Metric (2, 7): Read rows, 1686136.000000, 1.69e+06
Metric (2, 8): Write rows, 8430680.000000, 8.43e+06
Metric (2, 9): Mean Latency (ms), 23.809055, 23.8
Metric (2, 10): Median Latency (ms, averaged), 22.060156, 22.1
Metric (2, 11): 90% Latency (ms, averaged), 29.850006, 29.9
Metric (2, 12): 98% Latency (ms, averaged), 50.099850, 50.1
Metric (2, 13): Max Latency (ms, averaged), 235.465765, 235
Metric (2, 14): Mean Row Read Latency (ms), 5.736473, 5.74
Metric (2, 15): Median Row Read Latency (ms, averaged), 5.322695, 5.32
Metric (2, 16): Max Row Read Latency (ms, averaged), 177.443027, 177
Metric (2, 17): Mean Total Read Latency (ms), 5.724418, 5.72
Metric (2, 18): Median Total Read Latency (ms, averaged), 5.309343, 5.31
Metric (2, 19): Max Total Latency (ms, averaged), 177.443027, 177
Metric (2, 20): Mean GRV Latency (ms), 7.315672, 7.32
Metric (2, 21): Median GRV Latency (ms, averaged), 6.891727, 6.89
Metric (2, 22): Max GRV Latency (ms, averaged), 36.601305, 36.6
Metric (2, 23): Mean Commit Latency (ms), 9.680379, 9.68
Metric (2, 24): Median Commit Latency (ms, averaged), 8.915186, 8.92
Metric (2, 25): Max Commit Latency (ms, averaged), 54.644346, 54.6
Metric (2, 26): Read rows/sec, 12489.896296, 1.25e+04
Metric (2, 27): Write rows/sec, 62449.481481, 6.24e+04
Metric (2, 28): Bytes read/sec, 1398868.385185, 1.4e+06
Metric (2, 29): Bytes written/sec, 6994341.925926, 6.99e+06
Metric (3, 0): Measured Duration, 135.000000, 135
Metric (3, 1): Transactions/sec, 12506.903704, 1.25e+04
Metric (3, 2): Operations/sec, 75041.422222, 7.5e+04
Metric (3, 3): A Transactions, 1688432.000000, 1688432
Metric (3, 4): B Transactions, 0.000000, 0
Metric (3, 5): Retries, 71062.000000, 71062
Metric (3, 6): Mean load time (seconds), 0.000000, 0
Metric (3, 7): Read rows, 1688432.000000, 1.69e+06
Metric (3, 8): Write rows, 8442160.000000, 8.44e+06
Metric (3, 9): Mean Latency (ms), 24.029755, 24
Metric (3, 10): Median Latency (ms, averaged), 22.102594, 22.1
Metric (3, 11): 90% Latency (ms, averaged), 30.331612, 30.3
Metric (3, 12): 98% Latency (ms, averaged), 51.876068, 51.9
Metric (3, 13): Max Latency (ms, averaged), 296.543598, 297
Metric (3, 14): Mean Row Read Latency (ms), 5.882259, 5.88
Metric (3, 15): Median Row Read Latency (ms, averaged), 5.442858, 5.44
Metric (3, 16): Max Row Read Latency (ms, averaged), 228.428364, 228
Metric (3, 17): Mean Total Read Latency (ms), 5.902114, 5.9
Metric (3, 18): Median Total Read Latency (ms, averaged), 5.451441, 5.45
Metric (3, 19): Max Total Latency (ms, averaged), 228.428364, 228
Metric (3, 20): Mean GRV Latency (ms), 7.293286, 7.29
Metric (3, 21): Median GRV Latency (ms, averaged), 6.917238, 6.92
Metric (3, 22): Max GRV Latency (ms, averaged), 34.911633, 34.9
Metric (3, 23): Mean Commit Latency (ms), 9.678787, 9.68
Metric (3, 24): Median Commit Latency (ms, averaged), 8.921623, 8.92
Metric (3, 25): Max Commit Latency (ms, averaged), 54.420233, 54.4
Metric (3, 26): Read rows/sec, 12506.903704, 1.25e+04
Metric (3, 27): Write rows/sec, 62534.518519, 6.25e+04
Metric (3, 28): Bytes read/sec, 1400773.214815, 1.4e+06
Metric (3, 29): Bytes written/sec, 7003866.074074, 7e+06
4 test clients passed; 0 test clients failed

BEAUTY!
tps: 12,500x4=50,000!
commit latency: 10ms!

So it was the resolvers that were limiting the transaction flow it would appear. I can understand how the resolver can become a bottleneck but do you mind confirming that its impact is as important as that?

Link to status json dump (after the test was run, more useful as a reference for IP/IDs): status json

I still got 4% of conflicts, so I’ll try to increase the resolvers to 8 (double the number of log processes) and see if that can improve the results…

It’s been fun :slight_smile: thank you


(Matt Lohier) #19

I turned that 8 resolvers did not improve the conflicts.

I continued to raise the number of tps. 75,000 saturated the cluster a bit with commit latency at 250ms. I scaled to 30 instances, increased to 6 logs, proxies and resolvers. Latency came down again to a few ms. But a small proportion of conflicts remained.

I’ll try some other combo tomorrow. Let me know if there are some particular config changes you’d like me to make.

Thank you very much for your help today!
Cheers / Matt


(Roman Semenov) #20

Generally you need more resolvers on write heavy load and with more operations you have per write transaction.

As you can see storage servers at first use mostly CPU but as data starts flowing from transaction logs it becomes io bound. So for accurate results it may be better to not count the first minute in benchmarks (unless you plan to have spiky load in production). You need to increase storage count per disk only when you see 100% cpu usage after test is already running for some time, this may happen in read heavy workload.

Conflicts happen when 2 or more transactions access the same keys while both are uncommited. Increasing resolver count will not help with that. What can help is achieving lower latencies, modifying your transactions so that there are less conflicts, increasing key space.