Dear experts,
We are trying to upgrade the fdb-kubernetes-operator to v1.4.1, and from the dockerfile
It mentioned ARG FDB_VERSION=7.1.5
Does it mean for operator 1.4.1, it can only work with foundationdb version 7.1.5?
I tried operator 1.4.1 and foundationdb 6.2.29, we wrapped fdb operator and in one of the testing scenarios, I saw this error:
{"level":"info","ts":1656557697.521365,"logger":"fdbclient","msg":"Command completed","namespace":"testoperator1","cluster":"mdm-foundationdb-ibm","output":"The database is unav..."}
{"level":"error","ts":1656557697.5215714,"logger":"controller","msg":"Error in reconciliation","namespace":"testoperator1","cluster":"mdm-foundationdb-ibm","subReconciler":"controllers.changeCoordinators","requeueAfter":0,"error":"unable to fetch connection string: The database is unavailable; type `status' for more information.\n","stacktrace":"github.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBClusterReconciler).Reconcile\n\t/workspace/controllers/cluster_controller.go:183\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214"}
{"level":"error","ts":1656557697.5217397,"logger":"controller-runtime.manager.controller.foundationdbcluster","msg":"Reconciler error","reconciler group":"apps.foundationdb.org","reconciler kind":"FoundationDBCluster","name":"mdm-foundationdb-ibm","namespace":"testoperator1","error":"unable to fetch connection string: The database is unavailable; type `status' for more information.\n","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214"}
I checked the pod with cluster_controller role, and didn’t find fdb.cluster file it is trying to read from.
is this expected?
what’s the compatibility of the operator version and the foundationdb version? thanks.
@johscheuer thank you. while compiling operator I used foundationdb v6.2.29, will it cause issue or I has to use 7.1.5?
The error was found in the recovery scenario like if I kill 3 storage pods, it will trigger coordinator change, then I observed this database unavailable issue. Any suggestions on how to solve it? thanks.
I checked one controller pod:
1000710+ 9 8 0 13:06 ? 00:00:09 /usr/bin/fdbserver --class cluster_controller --cluster_file /var/fdb/data/fdb.cluster --datadir /var/
1000710+ 474 0 0 13:34 pts/0 00:00:00 sh -i -c TERM=xterm sh
1000710+ 480 474 0 13:34 pts/0 00:00:00 sh
1000710+ 481 480 0 13:34 pts/0 00:00:00 ps -ef
$ cat /var/fdb/data/fdb.cluster
# DO NOT EDIT!
# This file is auto-generated, it is not to be edited by hand
mdm_foundationdb_ibm:SZ7NOf9vA1aDvRJBv3niEqrZTbLXkeBw@10.254.15.3:4500:tls,10.254.16.228:4500:tls,10.254.20.222:4500:tls
Thank you @johscheuer . I used kubectl fdb fix-coordinator-ips to correct the coordinator ip, it works. Just curious, is there any consideration why we don’t automate this procedure so that end user needn’t manually run this command?
Thank you @johscheuer . I used kubectl fdb fix-coordinator-ips to correct the coordinator ip, it works. Just curious, is there any consideration why we don’t automate this procedure so that end user needn’t manually run this command?
I think the honest answer is that no now implemented the logic in the operator (there should be an issue on GitHub were we discussed this) and with 7.1 DNS support for the cluster file is coming which means we don’t have to do this hack anymore and the operator can use a headless service. Just as a warning if you want to try out the DNS support, there might be some issues and you first want to test this in a test cluster. Once the DNS support is stable and used in production we will announce this.