There’s a lack of documentation on how to reproduce the results of official benchmark from the docs and I see that many developers are getting only a fraction of that performance. I think it is a good idea to make a reproducible reference cluster deployment that produces good benchmark results, it can be a good starting point for your custom cluster and is a great tool to understand how to tune fdb performance.
https://github.com/poma/fdb-test-cluster
These terraform and packer scripts deploy a high performance fdb cluster with benchmarking and monitoring tools. By default the cluster consists of 8x m5d.2xlarge
(8 cores, NVMe SSD) fdb instances with double ssd
engine, plus 4x tester instances of the same size. Scripts work by creating a base AMI and then deploying a cluster of them with custom config files. It allows to quickly deploy, banchmark, tune, and tear down a cluster so you only pay for a short uptime, the default cluster should cost around $6/hr. For detailed deployment info see readme in the repo. Scripts are based on this thread.
Performance is tested using built-in fdb testing tools, to reduce possibility of inaccurate results due to inefficiencies in the test software. To get the best performance out of your cluster it is better to treat fdb as a set of microservices that each have their own resource requirements and need a separate monitoring and scaling. Optimal role ratios and placement depend on your workload, machines, and topology so there is no golden rule, you need to actually test it and figure out bottlenecks. You can use included fdbtop tool monitor performance of each process.
Here are some specific features of the cluster:
- all process roles are specified explicitly
- log and storage roles are placed on different machines so that they don’t compete for the same disk so that logs have low latency
- the write subsystem processess (master, proxies, resolvers, logs) are placed close together for faster network communication. If your cluster is limited by network bandwidth, it might be better to do the opposite and spread out proxies and logs so that you saturate more network links.
- AWS NVMe disks are fast enough (~57k write iops for m5d.2xlarge) that they require multiple storage processes to saturate them even with SSD storage engine, especially on a read heavy workload. In addition, on AWS disk performance scales with core count. If you have slower or network disks you might want to reduce the amount of storage processes per disk down to 1-2.
- master role uses a separate process to make sure it has enough resources to generate commit versions with low latency
- although overall throughput can be increased by adding storage processes to log instances (their disk is not saturated), it may degrade latency in some workloads
I’m still in the process of figuring out how to better configure the cluster so any feedback or improvements to my configs are greatly appreciated.