FDB cluster with Kubernetes StatefulSet

We are planning to deploy FDB clusters in Kubernetes using a StatefulSet. We ran some tests with success, but we’re posting this here to double check that what we’re doing makes sense, before moving to production.

Our setup supports N nodes an K coordinators with the configuration:

  • PersistentVolumes
    • PersistentVolume-0, PersistentVolume-1 … PersistentVolume-N
    • these are all manually created, and assigned a static awsElasticBlockStore.volumeID which points to an existing EBS volumes
  • StatefulSet
    • storage-0, storage-1 … storage-N
    • out of N pods, we consider first K to be the coordinators
  • Headless Service
    • named “storage”
    • selecting all pods of the stateful set

Generating the cluster file (fdb.cluster) and configuration file (foundationdb.conf)
We run our own process named “storage” side by side with fdbserver and fdbmonitor in each POD.
This process is responsible for generating both fdb.cluster and foundationdb.conf files.

The process will take as a parameter K - the number of coordinators, and will start to resolve the IPs for each coordinator, by resolving the POD dns names in the Kube cluster: storage-0.storage, storage-1.storage, … storage-K.storage.

We run the resolving in a loop, and when we find out that the IPs changed, we generate a new fdb.cluster file, and we change the foundationdb.conf to point to it. fdbmonitor automatically restarts the fdbserver processes.

NOTE that we never run the “coordinators” command, and we never change the cluster description or id.

This has worked successfully in a test environment but we want to make sure we won’t end up with a bad configuration where we lose coordinator sync.

Sample Kubernetes yaml files:

Open Questions:

  1. Since we never use the “coordinators” command, could we end up in a locked state when adding/removing coordinators? Pods are restarted serially, so technically there will be inconsistent fdb.cluster files co-existing, one for the old coordinator set, and one for the new coordinator set.

  2. We want to be able to survive a full cluster outage, since we keep all pods on a single node. So far, deleting and re-creating the StatefulSet was successful, but I wonder if this will become a problem. in the future.

1 Like

Is there a reason you aren’t just using the FoundationDB Kubernetes Operator?

Unfortunately there is a very important scenario that is currently not supported by the FDB Operator: Failing and recovering all coordinators.

1 Like

That is a scenario we plan to address in the medium-term, once FDB has support for DNS names in the cluster file.

1 Like

As a shorter-term solution, we could investigate having the operator use a similar approach where we have the sidecar/init process take a cluster file with DNS names and resolve them into IPs. I would have concerns about the behavior when a single coordinator pod gets rescheduled, since it would come up with a new IP but the existing processes would discover it on the old IP. We might also have to change the behavior around storing the cluster file in the data directory, since in this configuration the dynamically generated cluster file would be the more reliable one.

This issue presents another potential solution to the problem of IPs changing when you recreate all pods: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/283