I see good discussion on k8s here, hence posting my requirements. Please correct me if I should post it somewhere else.
I have 2 separate requirements for deploying fdb on k8s as a StatefulSet, in which, each pod has its dedicated Persistent Volume .
Deploying/Scaling on k8s
I will first illustarate how cassandra is deployed/scaled on k8s.
Cassandra has a concept of “seed”, where you can provide any node of the cluster to a new node and the new node will join the cluster.
The steps are:
- “New node” contacts “Seed Node” and gets cluster info (effectively the cluster file)
- “New Node” makes this info durable (effectively creating cluster file locally)
- “New Node” joins the cluster
Note: Here, “Seed Node” is not required to be a special node (ex: a coordinator).
The first k8s pod is self seeded and next are seeded by the previous ones (satetefulset has ordered pods).
In general , one should try to have 3 designated seed nodes (per datacenter), so that we are resilient of pod failures. For this, one can manage initial nodes as trustable seeds.
It’s important to note that you might have your seed node excluded from cluster and so your new node might get old cluster information (stale cluster file) and that is something that needs to be managed (I will not go into its details, as this needs to be manged in any approach).
Now lets go to fdb’s doc: Adding machines to a cluster
- Copy an existing cluster file from a server in your cluster to the new machine, overwriting the existing
- Restart FoundationDB on the new machine so that it uses the new cluster file
How should I do this automatically while I deploy fdb as StatefulSet?
Please note that when I say automatically, I mean that I should just use “kubectl statefuleset scale” command and my cluster should scale (assuming all of my pods are identical by default).
There are workarounds:
- Use a shared Persistent Volume across all pods to share cluster file
- Fix coordinator pods (staefulset is ordered)
- Use kubectl cp inside pod
- Enable ssh across pods to get cluster file
But all workarouds have drawbacks.
Feature request: Could we have fdb support such seeding as part of inter-node communication?
Recovering pod after failure
When a pod in a StatefulSet fails, k8s allocates a new pod, with same DNS name, but can be at differnet IP. Note that the data it had is persistent in attached Persistent Volume.
How should I make fdb cluster rejoin such a node after it comes back with its data? (assume here taht it has cluster file and it connects to clster effectively)?
Feature request: Could we have fdb support identification of nodes in a cluster by some genearted ID, independent of IP/host?
Some more info
Please note that here I have not requested features like
- identify coordinator by DNS hostname
- support DNS hostname names in cluster file
I see “cluster file” as seed in a generic way, and I don’t care if it has DNS names or IPs inside it. I know my pods in the cluster (DNS name and current IP) and just want generic seeding to be avialable in fdb.
Similarly, I don’t want node identifier based on DNS name. It helps me in StatefulSet deployment. But I believe that it’s not something fdb should care about. Fdb should implement this in a generic way.
Note: I know that writing a custom k8s controller (with a custom scale operation) is one of a sol’n here. But, maintaining that is another overhead for me.