Which publicIPSource to use when using FDB with multi-region and kubernetes?

I’m in the process of evaluating creating highly available FDB clusters and so we want to rely on the multi-region configuration described here: Multi Region Replication · apple/foundationdb Wiki · GitHub.

So far our clusters were within 1 K8s cluster and one region so we would use publicIPSource set to service, which served us well and provided obviously some stability.

But using services with clusterIP is way more complicated with a multi region multi k8s cluster setup because clusterIP are not meant to be routable outside of a k8s cluster.

That leaves us with 2 other choices:

  • nodeIP
  • DNS name

NodeIP I feel is really dangerous because should a majority of controller pods be restarted and you now have a cluster that is not available. I have been burnt in the past with too many pods restarting at once and also with our cloud provider (Azure) upgrading nodes behind our back to trust 100% this for something that needs to be highly available.

I guess this leaves DNS only but I have a vague feeling with it too a not a good one because once in a while there is something weird going on with the DNS between the cluster, I don’t have a concrete example of that but the weird feeling that I would run into issues.

With that said I’m wondering for those of you that are using mutli-region FDB and k8s what are you using and why ?

The actual setup depends on your cloud provider and what they recommend for multi-Kubernetes setups. (not sure if Azure supports providing a routable subnet to the AKS cluster)

I guess this leaves DNS only but I have a vague feeling with it too a not a good one because once in a while there is something weird going on with the DNS between the cluster, I don’t have a concrete example of that but the weird feeling that I would run into issues.

DNS will only be used for the initial connection to the cluster, after that the clients will make use of the IP addresses of the Pods. So DNS is not in the critical path. For completeness there was an issue with DNS reported here: FDB regular recovery with operator - #10 by jlemaes