Context: I did some experiments with the operator in a test cluster. Under load, Kubernetes evicted the storage nodes because they were using too much memory (I had specified quite low resource requests and limits, since it is a small cluster just for experiments).
So far this is expected giving my configuration mistakes.
However, the storage Pods remained in the “evicted” state, even after adding new nodes to the cluster. This is unexpected, since other deployments in Kubernetes are automatically restarted.
Manually edit the connection string in the operator status to have the correct IPs
Manually edit the cluster files in /var/fdb/data/fdb.cluster on each pod to have the correct IPs
Bounce all of the fdbserver processes
Unfortunately, step 1 did not work. After editing the connection string the status is immediately restored to the old value.
I also tried editing the ConfigMap, but the value is changed back to the old one after a short time.
Thus, I was able to get the cluster working again, but the fdb-operator can no longer talk to the cluster. And since the ConfigMap also includes the wrong values any newly started pods will have the wrong cluster file and fail to connect.
Finally, I tried the kubectl fdb analyze example-cluster --auto-fix command, which suggested replacing all instances in the cluster. However, this just started the new processes without updating the connection string in the operator, so it did not fix the problem.
Questions:
How can I correctly recover from IP address changes?
Is it expected that Pods are not automatically restarted after being evicted or is this a configuration error on my side?
From my first experience with the operator I have the impression that it is not yet ready to be used in production and I should rather try to set it up outside of Kubernetes on machines with fixed IP addresses. Would you agree with this impression or is it something that can be fixed by configuring the operator differently?
I think this is a recovery procedure that we need to test and document better. One subtlety of “edit the connection string in the operator status” is that editing the status through kubectl commands will not work, because those commands do not support the status subresource (see kubectl edit or apply can not update .status when status sub resource is enabled · Issue #564 · kubernetes/kubectl · GitHub). You can work around this by providing the new connection string as a seedConnectionString in the spec, or by running raw API requests to update the status subresource. The seedConnectionString option would require that the operator be able to connect to the database using the new connection string, which may be a catch-22. However, I think that editing the cluster files in the data directory and bouncing the processes would be sufficient to get the cluster available and working again, so we may want to adjust this procedure based on the results of more testing.
The operator does not automatically recreate evicted pods, but you can file an issue describing the behavior you’re looking for here.
The behavior around IPs changing is a major concern for production readiness. We have an alternative configuration where you can use service IPs as the public IPs for the processes, which is described in the operator’s user manual: fdb-kubernetes-operator/customization.md at master · FoundationDB/fdb-kubernetes-operator · GitHub. In the medium term, we plan to change the way the coordinators are specified so we can use DNS names, but that will require changes in the FoundationDB core as well as the operator.
One additional thing to note is that we recommend running FDB with limits equal to requests, to ensure consistent performance and avoid eviction. If you omit the limits from the pod specs, the operator will set the limits to be equal to the requests as a default. This won’t protect against all cases where you would lose multiple pods, but it can help limit the frequency of large-scale disruptive events.
One more question regarding the memory limits: Is FoundationDB aware of the limits set in Kubernetes?
If I understood the documentation on fdb server configuration correctly, then the default is 8GB. So does this mean I should set request (and limit) in the operator to 8GB to avoid memory problems?
Asking, because I am currently experimenting with a rather small cluster to save costs and I am trying to figure out whether these are meaningful experiments or I have to directly go to more costly experiments.
FoundationDB is not aware of the limits in Kubernetes unless you provide the limit in the --memory flag. We could consider passing that automatically in the operator to keep things lined up, though.