ANN: benchmark FoundationDB with Go YCSB

Hi

I try to embed FoundationDB Go library in our Go YCSB implementation go-ycsb and find it can work now, like ./bin/go-ycsb run foundationdb -P workloads/workloada.

I will use this to do the benchmark, and I hope this tool can help anyone who wants to do the same thing :slight_smile:
If you find any problem, please let me know.

Sincerely,
Siddon

3 Likes

Is there any benchmark results yet? Can you share with us, thanks.

Our binaries have distributed performance tests built into them.

First start additional fdbserver processes with the “–class test” role. These processes will act as clients the run the benchmarking workload.

Next write a test file that describes a workload. Generally you will want to run a “ReadWrite” test, the following file is a good starting point that you can change depending on what kind of test you want to run:

https://github.com/apple/foundationdb/blob/master/tests/RandomReadWrite.txt

Most of the parameters are self explanatory. The alpha parameter specifies the percentage chance of running a “B” transaction. For example alpha=0.0 would only run “A” transactions, and alpha=1.0 would only run “B” transactions. Generally you will always want at least one read in each transaction. The file I linked is setup to have 90% of transactions be read only, and 10% of transactions write 10 random key value pairs and also read one key.

Once you have a test file, run fdbserver with the following parameters: -r multitest -f [TESTFILE] --num_testers [NUMBER OF TEST CLASS FDBSERVERS]

Please note that this benchmark will load data into the cluster, and does not clean up after the test is over. Therefore, you should not run it on clusters that already have data in them.

After the test finishes, the multitest process will aggregate results from all for the test processes and display them from you. The first then to check is if the cluster achieved the targeted transaction per second.

If it was not able to achieve it, the interested metrics are how many transactions per second the test achieved, and possibly the read and commit latencies. In saturation we attempt to push all of the latency into getting read versions, so that number should be really high, along with the total latency.

If it was able to achieve it, then all the latency numbers will be interesting. The size of the data set will determine how many reads needed to read from disk, so be aware of that detail when looking at read latencies.

Getting good performance from FoundationDB does not come out of the box. There are a number of things that will dramatically reduce your performance if you just start up a ton of fdbserver processes with no additional work. See https://apple.github.io/foundationdb/configuration.html#guidelines-for-setting-process-class for details.

In addition to the advice there, make sure to only run one transaction class process per disk. It is okay to run multiple storage class processes per disk, but do not run more fdbserver processes than physical cores. All the machines for the same process class should be homogenous. If you ran half your disks with one process and the other half with two processes things will not go well. If you included one storage server running on a really old disk the cluster would perform much worse than if you did not add that process to the cluster.

Before running a large FoundationDB cluster I would strongly recommend running benchmarks while changing the number of proxies, resolvers, and transaction logs to get a feel for the approximate ratio you should run. More is not always better, especially for proxies and resolvers, so I would hesitate to run too many of them, but check for yourself to see what gets good performance. Always have spares for stateless and transaction processes or performance during/after a failure will suffer dramatically.

4 Likes

I have posted an article about the benchmark here https://medium.com/@siddontang/benchmark-foundationdb-with-go-ycsb-f7282509dd58

I don’t do the cluster benchmark, maybe others can have a try.

1 Like

Thanks for the info! So I gave it a go on Ubuntu 17.10 with an out-of-the-box install, starting a test server:

sudo -u foundationdb fdbserver --class test --public_address auto:11111 --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/11111 --logdir /var/log/foundationdb --listen_address public

Then I ran the following test:

fdbserver -r multitest -f foundationdb/tests/Happy.txt --num_testers 1
setting up test (Unclogged)...
running test...
Unclogged complete
checking tests...
fetching metrics...
Metric (0, 0): Transactions, 6961.000000, 6961
Metric (0, 1): Retries, 122.000000, 122
Metric (0, 2): Retries.past_version, 0.000000, 0
Metric (0, 3): Retries.commit_failed, 122.000000, 122
Metric (0, 4): Avg Latency (ms), 4.595624, 4.6
Metric (0, 5): Read rows/simsec (approx), 10441.500000, 1.04e+04
Metric (0, 6): Write rows/simsec (approx), 13922.000000, 1.39e+04
1 test clients passed; 0 test clients failed

Any ideas why this shows commit failures and I have some N2_ReadError messages in the trace logs? Also, the datadir for the server is empty except for a processId file.

The test you are running (Happy.txt) was intended for simulation instead of performance testing. You should modify the file I mentioned earlier with the parameters you want to use instead of using the checked in test files directly.

The reason for commit failures is that the transactions in this workload conflicted with each other. The read errors are an indication that network connections are closing for some reason. These naturally happen in low amounts, and generally do not need to be investigated.

The datadir for the test server should be empty because it is a stateless job. That test server is only executing transactions against the real server, it does not store any state itself.

1 Like