Ask for some recommands of running large cluster

Hi, we plan to run large cluster on production enviroment with 16 ecs2.
Official documentation tells the recommended minimum number of classes:
https://apple.github.io/foundationdb/configuration.html#guidelines-for-setting-process-class
class=transaction (log server) processes is 8 (active) + 2 (standby) .
class=stateless processes is 1 (GRV proxy) + 3 (commit proxy) + 1 (resolver) + 1 (cluster controller) + 1 (master) + 2 (standby).
And process types include grv_proxies, commit_proxies, resolvers and logs.
I am confused, because I check cluster_spec.md from fdb-kubernetes-operator, and find there are more processes, e.g., storage, cluster_controller, stateless.
I am not sure which is the best way to run large cluster, use fdb-kubernetes-operator to deploy cluster or deploy cluster as official documentation telled.

The following document has some more context how to scale/shrink a cluster with the operator: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/manual/scaling.md. One thing to note is that the operator will automatically add the 2 standby processes if you run with triple replication (otherwise only one standby will be added).

I am confused, because I check cluster_spec.md from fdb-kubernetes-operator, and find there are more processes, e.g., storage, cluster_controller, stateless.

Most of the other process counts in the operator can be ignored, they can be used if you want to run a dedicated Pod as a cluster controller or any other process role.

The number of storage processes (and disk size) defines the physical capacity of you cluster e.g. if you create 20 storage processes each with a 500GB disk you’ll get a cluster with a physical capacity of ~10TB.

The following config will create all the required processes (I just picked 20 storage processes because I used them in my example above):

apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
  name: your-cluster
spec:
  version: 7.1.11
  processCounts:
    storage: 20
    log: 8
    stateless: 7

I am not sure which is the best way to run large cluster, use fdb-kubernetes-operator to deploy cluster or deploy cluster as official documentation telled.

That depends in the end on the platform that you want to use. If you want to use Kubernetes (e.g. EKS) the operator should be a good choice which implements most of the operational work. If you want to use ec2 you probably want to create your own automation tooling around FDB.

In any way I would propose that you test your initial setup with either the operator or a manually bootstrapped cluster. Once you have the cluster you can run some manual verification tasks and also do some benchmarking of the cluster and test different configurations. Using the operator and Kubernetes is probably the fastest and easiest way to create a FDB cluster.

1 Like

Thanks a lot, it is very helpfull to me. I have another question :grinning:
The config you suggested include 8 log and 7 stateless, how to calculate the number of log and stateless?

Those numbers were more or less randomly picked based on the referenced doc. I’m not aware of any calculations for FDB cluster sizes, since this will depend a lot on the underlying hardware (e.g. disk sizes) and you workload characteristics. If you have an application or a tool to mimic your workload I would propose to do some testing with different setups to see how many processes you need.