What's the compatible version of foundationdb server and operator version?

Dear experts,
We are trying to upgrade the fdb-kubernetes-operator to v1.4.1, and from the dockerfile

It mentioned ARG FDB_VERSION=7.1.5
Does it mean for operator 1.4.1, it can only work with foundationdb version 7.1.5?
I tried operator 1.4.1 and foundationdb 6.2.29, we wrapped fdb operator and in one of the testing scenarios, I saw this error:

{"level":"info","ts":1656557697.521365,"logger":"fdbclient","msg":"Command completed","namespace":"testoperator1","cluster":"mdm-foundationdb-ibm","output":"The database is unav..."}
{"level":"error","ts":1656557697.5215714,"logger":"controller","msg":"Error in reconciliation","namespace":"testoperator1","cluster":"mdm-foundationdb-ibm","subReconciler":"controllers.changeCoordinators","requeueAfter":0,"error":"unable to fetch connection string: The database is unavailable; type `status' for more information.\n","stacktrace":"github.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBClusterReconciler).Reconcile\n\t/workspace/controllers/cluster_controller.go:183\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214"}
{"level":"error","ts":1656557697.5217397,"logger":"controller-runtime.manager.controller.foundationdbcluster","msg":"Reconciler error","reconciler group":"apps.foundationdb.org","reconciler kind":"FoundationDBCluster","name":"mdm-foundationdb-ibm","namespace":"testoperator1","error":"unable to fetch connection string: The database is unavailable; type `status' for more information.\n","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214"}

I checked the pod with cluster_controller role, and didn’t find fdb.cluster file it is trying to read from.
is this expected?
what’s the compatibility of the operator version and the foundationdb version? thanks.

This doc contains all information about the supported versions: fdb-kubernetes-operator/compatibility.md at main · FoundationDB/fdb-kubernetes-operator · GitHub. The version you referenced is only the version used for compiling the operator not the supported version.

When did you see this error message during the creation of the cluster?

@johscheuer thank you. while compiling operator I used foundationdb v6.2.29, will it cause issue or I has to use 7.1.5?
The error was found in the recovery scenario like if I kill 3 storage pods, it will trigger coordinator change, then I observed this database unavailable issue. Any suggestions on how to solve it? thanks.

I checked one controller pod:

1000710+       9       8  0 13:06 ?        00:00:09 /usr/bin/fdbserver --class cluster_controller --cluster_file /var/fdb/data/fdb.cluster --datadir /var/
1000710+     474       0  0 13:34 pts/0    00:00:00 sh -i -c TERM=xterm sh
1000710+     480     474  0 13:34 pts/0    00:00:00 sh
1000710+     481     480  0 13:34 pts/0    00:00:00 ps -ef
$ cat /var/fdb/data/fdb.cluster
# DO NOT EDIT!
# This file is auto-generated, it is not to be edited by hand
mdm_foundationdb_ibm:SZ7NOf9vA1aDvRJBv3niEqrZTbLXkeBw@10.254.15.3:4500:tls,10.254.16.228:4500:tls,10.254.20.222:4500:tls

the ip in fdb.cluster is old. check storage pods

mdm-foundationdb-ibm-storage-1                                2/2     Running     0          25m   10.254.16.235   worker0.fdbtest3.cp.fyre.ibm.com   <none>           <none>
mdm-foundationdb-ibm-storage-2                                2/2     Running     0          25m   10.254.15.5     worker2.fdbtest3.cp.fyre.ibm.com   <none>           <none>
mdm-foundationdb-ibm-storage-3                                2/2     Running     0          25m   10.254.20.223   worker1.fdbtest3.cp.fyre.ibm.com   <none>           <none>

In the latest Dockerfile we actually use 6.2.29 for building fdb-kubernetes-operator/Dockerfile at main · FoundationDB/fdb-kubernetes-operator · GitHub, so that the go bindings and the version we use for compiling supports the minimum version compatible version (6.2). There is no need to compile the operator with a different FDB version, you just can inject the required libraries and binaries with init containers, see: fdb-kubernetes-operator/manager.yaml at main · FoundationDB/fdb-kubernetes-operator · GitHub.

If you kill to many coordinators at once you can run into this: fdb-kubernetes-operator/warnings.md at main · FoundationDB/fdb-kubernetes-operator · GitHub. This can be fixed manually with the kubectl fdb plugin: fdb-kubernetes-operator/debugging.md at main · FoundationDB/fdb-kubernetes-operator · GitHub.

1 Like

Thank you @johscheuer . I used kubectl fdb fix-coordinator-ips to correct the coordinator ip, it works. Just curious, is there any consideration why we don’t automate this procedure so that end user needn’t manually run this command?

Thank you @johscheuer . I used kubectl fdb fix-coordinator-ips to correct the coordinator ip, it works. Just curious, is there any consideration why we don’t automate this procedure so that end user needn’t manually run this command?

I think the honest answer is that no now implemented the logic in the operator (there should be an issue on GitHub were we discussed this) and with 7.1 DNS support for the cluster file is coming which means we don’t have to do this hack anymore and the operator can use a headless service. Just as a warning if you want to try out the DNS support, there might be some issues and you first want to test this in a test cluster. Once the DNS support is stable and used in production we will announce this.