How to increase the read/write throughput for foundationdb 5.2

Hi, I am using foundationdb 5.2 , single machine, 24 cores CPU, 54G total available RAM.
I configured 8 foundationdb server processes on foundationdb.conf, storage engine is SSD, using default settings.
the sequential writing test results show that the Write throughput (MB/s) is only 68.8 (MB/s),
are there any good way to increase the throughout up to like 500+ (MB/s) ?

How have you configured process classes on your 8 server processes? How many disks do you have attached to this machine? Can you run FDB 6.2 instead? Is this bare metal or in the cloud? Are the disks locally attached or over the network? How many client processes (not threads) are you using to generate the test workload?

hi,
here are the details of the configure file:

*## foundationdb.conf*
*##*
*## Configuration file for FoundationDB server processes*
*## Full documentation is available at*
*## https://apple.github.io/foundationdb/configuration.html#the-configuration-file*

*[fdbmonitor]*
*user = txcn *
*group = txcn*

*[general]*
*restart_delay = 60*
*## by default, restart_backoff = restart_delay_reset_interval = restart_delay*
*# initial_restart_delay = 0*
*# restart_backoff = 60*
*# restart_delay_reset_interval = 60*
*cluster_file = /etc/foundationdb/fdb.cluster*
*# delete_envvars =*
*# kill_on_configuration_change = true*

*## Default parameters for individual fdbserver processes*
*[fdbserver]*
*command = /usr/sbin/fdbserver*
*public_address = auto:$ID*
*listen_address = public*
*datadir = /var/lib/foundationdb/data/$ID*
*logdir = /var/log/foundationdb*
*# logsize = 10MiB*
*# maxlogssize = 100MiB*
*# machine_id = *
*# datacenter_id = *
*# class = *
*# memory = 8GiB*
*# storage_memory = 1GiB*
*# metrics_cluster = *
*# metrics_prefix =*

*## An individual fdbserver process with id 4500*
*## Parameters set here override defaults from the [fdbserver] section*
*[fdbserver.4500]*
*datadir = /opt/data/vanda/fdb-5.2.5/data/4500*
*logdir = /opt/data/vanda/fdb-5.2.5/log/fdbserver*

*[fdbserver.4501]*
*datadir = /opt/data/vanda/fdb-5.2.5/data/4501*
*logdir = /opt/data/vanda/fdb-5.2.5/log/fdbserver1*

*[fdbserver.4502]*
*datadir = /opt/data/vanda/fdb-5.2.5/data/4502*
*logdir = /opt/data/vanda/fdb-5.2.5/log/fdbserver2*

*[fdbserver.4503]*
*datadir = /opt/data/vanda/fdb-5.2.5/data/4503*
*logdir = /opt/data/vanda/fdb-5.2.5/log/fdbserver3*

*[fdbserver.4504]*
*datadir = /opt/data/vanda/fdb-5.2.5/data/4504*
*logdir = /opt/data/vanda/fdb-5.2.5/log/fdbserver4*

*[fdbserver.4505]*
*datadir = /opt/data/vanda/fdb-5.2.5/data/4505*
*logdir = /opt/data/vanda/fdb-5.2.5/log/fdbserver5*

*[fdbserver.4506]*
*datadir = /opt/data/vanda/fdb-5.2.5/data/4506*
*logdir = /opt/data/vanda/fdb-5.2.5/log/fdbserver6*

*[fdbserver.4507]*
*datadir = /opt/data/vanda/fdb-5.2.5/data/4507*
*logdir = /opt/data/vanda/fdb-5.2.5/log/fdbserver7*


*[backup_agent]*
*command = /usr/lib/foundationdb/backup_agent/backup_agent*
*logdir = /opt/data/vanda/fdb-5.2.5/log/backup_agent*

*[backup_agent.1]*

I am running the foundatondb server on only one machine, only one ssd disk is attached to this machine, the ssd disk is mounted to the path of ā€œ/opt/data/vandaā€
. I am using ycsb to do the benchmark test, ycsb can only support FDB5.2, so I have t o use FDB5.2. the test is running on the local machine, bare metal , other than in the cloud. ssd disk is locally attached. I use 8 FDB server processes and 1 ycsb instance to run, there are 24 threads in 1 ycsb instance.

thanks

If you run fdbcli> status on this cluster, it should give you a warning like:

WARNING: A single process is both a transaction log and a storage server.
For best performance use dedicated disks for the transaction logs by setting process classes.

EDIT: Ah, the code only logs this warning if you have 10 or more processesā€¦

So Iā€™d suggest setting process_class=stateless on process 0, process_class=log on process 1, and process_class=storage on the rest, which will at least isolate different classes of priority work. If you search these forums for ā€œprocess classā€, thereā€™s a number of other threads that go into process classes, and how/why to configure them.

However, Iā€™m concerned that even if you continue with this test, it isnā€™t really going to be representative of what running a real workload against a real cluster would be like, which is I assume why youā€™re doing this benchmarking?

By running your transaction logs and storage nodes on the same disk, the sequential transaction log workload is no longer sequential for the disk, and it will be fighting for fsyncs against the rest of your processes. One machine with eight attached disks would be better.

Though I greatly appreciate that someone contributed a YCSB client for FDB, YCSB is not a great benchmarking workload for FDB, as the results donā€™t extrapolate out to other workloads well. The implementation did one transaction per key read or per key written, which means itā€™s actually benchmarking the GetReadVersion operation that starts a transaction and not reads or writes. Real world workloads tend to do multiple operations per transaction, which amortizes the cost of starting a transaction.

FoundationDB clients also have only one thread that handles FDB network traffic in the background, which can be easily saturated by a sustained benchmarking workload. Running multiple client processes would be the better thing to do.

YCSB is written against the 5.2 bindings, but you can still download a 6.2 client library from foundationdb.org, point to it with FDB_NETWORK_OPTION_EXTERNAL_CLIENT_DIRECTORY, and run against a 6.2 cluster.

If you search this forum for ā€œmultitestā€, youā€™ll find examples of the tooling thatā€™s built into fdbserver that we use for benchmarking FoundationDB.

Overall, if youā€™re looking for 500MB/s per host/disk, then FDB isnā€™t going to be able to provide that. (And some SSDs canā€™t either.)

1 Like

thanks very much for your detailed and very good explanantion.

the purpose Iā€™m doing such kind of benchmark is checking how FDB works when saving data on the different SSD. in another words, to check whether FDB could work much better on SSD 1 or SSD 2.
so we need higher write/read throughput, if the write/read throughput just only blow 100 MB/s or %cpu-util is very low like just only 20%, it means that only very data go through the disk, we canā€™t identify which one is better to use for FDB.

If I attached multiple ssd disks on one machine to do test, I guess the read/write throughout (MB/s) showed by iostat could be still be very low even though the FDBā€™ performance (OPS) must be increased , am I right?

by the way, do you know how to make %cpu-util higher like up to 80%-90% when using FDB? or are there any independent tools to help benchmark FDB and also %cpu-util is near up to 90%-100% or diskā€™s read/write throughput is at least 200-300 MB/s ? or we need write such kind of tool by self?
I searched multitest in the forum and I need time to digest them.

Thanks again,

Oh, thatā€™s a very different motivation than I had assumed. In that case, Iā€™d think that the storage engine would be far more interesting than the transaction log, and I think you can do this in a much easier way than actually running FDB.

A transaction log largely just does sequential writes, and in its pathological case, random reads. If you set up an FIO test that sequentially appended 64KB chunks to a file and fsyncā€™d after each one, then thatā€™s a decent rough approximation of a transaction log. If a storage server falls far behind, then add in bursts of ten 64KB reads.

For storage servers, I think @SteavedHams has a redwood storage engine benchmark program that you could use. Steve, do you agree, and could you explain how to run it?

thanks for the great help! looking forward to the redwood storage engine benchmark program

hi,
could you let me know what the default block-size/page-size is for fondationdb and how we can change it? I searched the document and failed to find any clue.

Thanks,

With the ssd (SQLite) engine, the page size canā€™t be changed. Redwood will support configurable page sizes, however.

If you would like to do disk performance testing with Redwood, you will have to build my repoā€™s default branch from source. The page size must also be changed in the source code currently. The code is here:

and the ā€œsetā€ test can be run with this command:

fdbserver -r test -f tests/RedwoodPerfSet.txt

This is a write-only benchmark that writes keys and values in batches based on a bunch of configurable parameters which are all in source code, so to change the write pattern you have to modify those. The default is to write 20mb batches composed of large groups of sequential writes, which will be disk-bound.

The pageSize can be changed here, and the other workload parameters are further down

thanks very much! I will try.

hi,
I checked the code of foundation 5.2.5 (I am using it currently), it shows:

define SQLITE_DEFAULT_PAGE_SIZE 1024
#ifndef SQLITE_MAX_DEFAULT_PAGE_SIZE

define SQLITE_MAX_DEFAULT_PAGE_SIZE 8192

#endif

I am using ssd as the storage engine, my questions are:

  1. the default page size is 1K and the max default page size is 8K ? can I understand like that?
  2. what is the purpose to use SQLITE_MAX_DEFAULT_PAGE_SIZE?

Thanks,
Luo

No. These are internal SQLite defaults. FDB sets the page size to 4096 when creating a new SQLite database file. That line is here and changing the size to something else is completely untested. I doubt it will work, or it may work with worse performance, because of how SQLiteā€™s file access works in FDB.

SQLiteā€™s file operations are routed through a caching layer called AsyncFileCached. There are 2 page caches instanced, a 64k page cache for SQLite WAL (write-ahead-log) files and a 4k page cache for SQLiteā€™s main page file and all other files. SQLite uses ā€œzero copy readsā€ which AsyncFileCached requires to be exactly pageSize (4096) in length or it throws an io_error exception. Itā€™s possible (I have not checked) that when this happens it will fall back to non zero-copy reads, but if so performance will be terrible. Without zero-copy reads, most page accesses by SQLite will incur a page copy, even pages that are already in the cache (FDBā€™s cache) because from SQLiteā€™s perspective it is reading from the filesystem. SQLite has an internal page cache but it is intentionally configured to be very small so that FDBā€™s code can manage which pages are cached and can share the page cache capacity across all users of AsyncFileCached.

You can try modifying AsyncFileCache.actor.h/cpp to change the 4k page size to some other page size. It might work to just change all instances of 4096 in those files to some other page size but you will also have to instance a FastAllocator<> for the new page size if it is larger than 8192 or you will get compile errors. You could also just not change the checks which are trying to use FastAllocator<4096>, but then it will use slower memory allocation for each cache entry creation.

1 Like

Hi,
I have a question here,
I changed to use foundationDB 6.2.11 now, so Redwood storage engine will be used by default if I use the command line below to initialize database:
fdbcli --exec ā€˜configure single new ssdā€™

am I right?

thanks,

No. Redwood is still experimental, not the default, and be warned that its files on disk are NOT forward-compatible. Also, 6.2.11 is rather old for Redwood and missing a lot of optimization. I suggest you try the latest master branch from the main FDB repository if you want to test with it. Master branch was updated yesterday with the latest version.

The configuration command is ssd-redwood-experimental.

do you mean I need clone the latest master branch and then compile from source code before I can use the latest Redwood?
or I can download the latest release version Release 6.2.17 ? I notices that Release 6.2.17 is just made yesterday.

Thanks,

The redwood update is only in master and not release-6.2, so 6.2.17 doesnā€™t have it. A future 6.2 release likely will, but thereā€™s not a published build that you can pull with an updated redwood right now.

hi,
I clone the master code branch and compiled it. so how can I use FDB right now? when I installed FDB from rpm package, I just follow the instruction in the official document, but when I installed FDB by compiling from the source code, I couldnā€™t find out how to use it?
is it like this below:
/usr/sbin/fdbserver --class storage --cluster_file /etc/foundationdb/fdb.cluster --datadir /opt/data/4500 --listen_address public --logdir /opt/data/log/fdbserver --public_address auto:4500

Thanks,

Hi,
I cloned the latest master source code and compiled&installed FDB successfully, and then I use the command line below to start FDB and create database, it works fine.
do you think it is the correct way to use FDB since I installed it from source code other than rpm package.
the command line to start FDB:

/opt/app/fdb/rwbuild/bin/fdbserver --class storage --cluster_file ./fdb.cluster --datadir /opt/data/fdb-6.2/data/4000 --listen_address public --logdir /opt/data/fdb-6.2/log/fdbserver --public_address auto:4000

thanks,

It looks fine to me. (I presume though, that you have one non-storage class process?)

Youā€™re welcome to also follow the instructions in the README to build an RPM, and then install the RPM if thatā€™d be an easier or more familiar installation process for you.

yes, I have 2 non-storage class process: log and stateless, the others are storage.
unfortunately, I encountered one problem and record it here :

so I think itā€™s the best way to build an RPM and then install the RPM.

Thanks,