Recovering from FoundationDB crashes

Context: I did some experiments with the operator in a test cluster. Under load, Kubernetes evicted the storage nodes because they were using too much memory (I had specified quite low resource requests and limits, since it is a small cluster just for experiments).

So far this is expected giving my configuration mistakes.

However, the storage Pods remained in the “evicted” state, even after adding new nodes to the cluster. This is unexpected, since other deployments in Kubernetes are automatically restarted.

So I manually killed the pods to restore the database. This changed the ip addresses of the coordinators, which apparently is not yet supported by the operator (Seems like it is a planned feature: Referencing pods by IP is fragile · Issue #266 · FoundationDB/fdb-kubernetes-operator · GitHub).

I found this thread which explains how to handle changed IP addresses: K8s operator fdb.cluster IP addresses issue

  1. Manually edit the connection string in the operator status to have the correct IPs
  2. Manually edit the cluster files in /var/fdb/data/fdb.cluster on each pod to have the correct IPs
  3. Bounce all of the fdbserver processes

Unfortunately, step 1 did not work. After editing the connection string the status is immediately restored to the old value.
I also tried editing the ConfigMap, but the value is changed back to the old one after a short time.

Thus, I was able to get the cluster working again, but the fdb-operator can no longer talk to the cluster. And since the ConfigMap also includes the wrong values any newly started pods will have the wrong cluster file and fail to connect.

Finally, I tried the kubectl fdb analyze example-cluster --auto-fix command, which suggested replacing all instances in the cluster. However, this just started the new processes without updating the connection string in the operator, so it did not fix the problem.

Questions:

  1. How can I correctly recover from IP address changes?
  2. Is it expected that Pods are not automatically restarted after being evicted or is this a configuration error on my side?
  3. From my first experience with the operator I have the impression that it is not yet ready to be used in production and I should rather try to set it up outside of Kubernetes on machines with fixed IP addresses. Would you agree with this impression or is it something that can be fixed by configuring the operator differently?

I think this is a recovery procedure that we need to test and document better. One subtlety of “edit the connection string in the operator status” is that editing the status through kubectl commands will not work, because those commands do not support the status subresource (see kubectl edit or apply can not update .status when status sub resource is enabled · Issue #564 · kubernetes/kubectl · GitHub). You can work around this by providing the new connection string as a seedConnectionString in the spec, or by running raw API requests to update the status subresource. The seedConnectionString option would require that the operator be able to connect to the database using the new connection string, which may be a catch-22. However, I think that editing the cluster files in the data directory and bouncing the processes would be sufficient to get the cluster available and working again, so we may want to adjust this procedure based on the results of more testing.

The operator does not automatically recreate evicted pods, but you can file an issue describing the behavior you’re looking for here.

The behavior around IPs changing is a major concern for production readiness. We have an alternative configuration where you can use service IPs as the public IPs for the processes, which is described in the operator’s user manual: fdb-kubernetes-operator/customization.md at master · FoundationDB/fdb-kubernetes-operator · GitHub. In the medium term, we plan to change the way the coordinators are specified so we can use DNS names, but that will require changes in the FoundationDB core as well as the operator.

1 Like

One additional thing to note is that we recommend running FDB with limits equal to requests, to ensure consistent performance and avoid eviction. If you omit the limits from the pod specs, the operator will set the limits to be equal to the requests as a default. This won’t protect against all cases where you would lose multiple pods, but it can help limit the frequency of large-scale disruptive events.

1 Like

Thanks, that helped a lot.

One more question regarding the memory limits: Is FoundationDB aware of the limits set in Kubernetes?
If I understood the documentation on fdb server configuration correctly, then the default is 8GB. So does this mean I should set request (and limit) in the operator to 8GB to avoid memory problems?

Asking, because I am currently experimenting with a rather small cluster to save costs and I am trying to figure out whether these are meaningful experiments or I have to directly go to more costly experiments.

FoundationDB is not aware of the limits in Kubernetes unless you provide the limit in the --memory flag. We could consider passing that automatically in the operator to keep things lined up, though.

2 Likes

I ran some tests on this recovery procedure, and I added documentation on the recovery procedure to the operator documentation: Add a command to fix coordinator IPs after a majority of coordinators get new IPs. by brownleej · Pull Request #892 �. This PR also includes a kubectl-fdb command to help with the recovery process.

1 Like