Looking to tweak FDB performance on Kubernetes with the FDB Operator

Hi all,

I’m looking for some extra information on optimizing the foundation db operator running on kubernetes. I’m looking for some general recommendations instead of the specifics for my cluster. It’s up to me to handle minutia around our systems, and I don’t expect anyone else to debug our systems for us.

I have found this page: Performance — FoundationDB 7.1 but it’s a bit high level, I’m looking for cluster configuration help specifically with the operator (Or differences between VMs and operator configuration). If there is an FAQ, or a common issues with performance guide, or some other performance guide that I can look into, that would be much appreciated. Anyone with informal tips can help out too.

Also if there is any generic information about the performance characterizations between the vm based solution and the kubernetes based solution would be helpful. I’m seeing a 60% reduction in performance using the operator vs on VMs raw, but again, I’m not asking anyone to specifically debug my situation, but point me towards a guide I can use to make sure I’ve crossed the t’s and dotted the eyes in terms of operator parameters.

Thanks in advance for any pointers.

We run FDB in single kubernetes cluster in multiple az. Some stuff that can be handy:

  1. Set cpu request to 1 and no limit (fdb is single threaded no need for more)
  2. Set memory for pods same as fdb expects
  3. Try running in single zone (if concern about resiliency level and read latency in other zones check into satellite logs to have clusters in different zones)
  4. Due 1. use separate pods for separate components (tlog, storage). We are using snapshot backup feature so we also have coordinator separately
  5. We spread specific process groups with node and zone topologyspread to have our resiliency level
  6. Check gp3 ebs type (default gp2 is much slower), we have plans to have nvme prefer read cache in future and this can improve it even more
  7. We use sidecar log collector and investigate issue >30 severity level, we have our own fdb exporter and dashboard that also aids us