Now we are doing some upgrade test from apple operator 0.48 to 1.4.1, as the CRD will change, and we have to recreate CRD, so the cr will also be recreated, and pvc on that cr will be deleted then. All the data on pvc will be lost. Is there any solution to upgrade the CRD from 0.48 to 1.4.1 without deleting the old crd, just patch it? So our CR can be kept and data on pvc can be also kept. Thanks!
I have tried the solution, the cr is changed to v1beta2, thanks! While I am also trying to upgrade the fdb cluster from 6.2.29 to 6.3.24, I use the 6.3.24 client to build 1.4.1 apple operator docker image and also upgrade the fdb cluster to 6.3.24, but in the apple operator log I still see the following error message:
...
{"level":"error","ts":1661767440.0894866,"logger":"controller","msg":"Error getting connection string from cluster","namespace":"testoperator1","cluster":"xxxxx","reconciler":"updateStatus","version":"6.2.29","connectionString":"xxxxx:Vsy7bpupiwkzdBkQzmGyZvyed4T16W1l@10.xxx.xxx.xxx:tls,10.xxx.xxx.xxx:4500:tls,10.xxx.xxx.xxx:4500:tls","error":"unable to fetch connection string: The database is unavailable; type `status' for more information.\n","stacktrace":"github.com/FoundationDB/fdb-kubernetes-operator/controllers.updateStatus.reconcile\n\t/workspace/controllers/update_status.go:69\ngithub.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBClusterReconciler).Reconcile\n\t/workspace/controllers/cluster_controller.go:169\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214"}
{"level":"info","ts":1661767440.0895357,"logger":"controller","msg":"Attempting to get connection string from cluster","namespace":"testoperator1","cluster":"xxxxx","reconciler":"updateStatus","version":"6.3.24","connectionString":"xxxxx:Vsy7bpupiwkzdBkQzmGyZvyed4T16W1l@10.xxx.xxx.xxx:4500:tls,10.xxx.xxx.xxx:4500:tls,10.xxx.xxx.xxx:4500:tls"}
...
It seems the db is inconsistent, but I am not sure where to check it.
During upgrade fdb cluster I just set skip to true and delete pods and update foundationcluster image to 6.3.24 and then set skip to false , then the pods come back. I am not sure if there is still some 6.2.29 info kept in the cluster, how can I change it to 6.3.24?
Is the upgrade somehow related to the initial question? You have to provide the client libraries for 6.2 and 6.3 otherwise the operator is not able to communicate to the cluster in those versions.
@johscheuer After force set the running version to version, the version issue is gone, but I encounter the following error in apple operator log:
...
"level":"error","ts":1662093651.1403134,"logger":"controller","msg":"Error in reconciliation","namespace":"testoperator1","cluster":"mdm-foundationdb-ibm","subReconciler":"controllers.updateDatabaseConfiguration","requeueAfter":0,"error":"FoundationDB error code 1031 (Operation aborted because the transaction timed out)","stacktrace":"github.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBClusterReconciler).Reconcile\n\t/workspace/controllers/cluster_controller.go:183\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214"}
...
Any suggestion? Is there any formal guide to introduce the upgrade from fdb 6.2 to fdb 6.3 on cloud? Thanks!
@johscheuer If I have installed a cluster (0.48 apple operator + 6.2.29 fdb) and there is some data in the db, and I want to upgrade it to (1.4.1 apple operator + 6.3.24 fdb), any suggestion about the upgrade steps? Is there any detailed guide? Thanks!
Is there a reason you have to upgrade the operator and the FDB cluster at the same time? Or are those actually independent steps that happen one after the other?
After force set the running version to version, the version issue is gone, but I encounter the following error in apple operator log:
What do you mean by force set the running version? The provided error only suggest that the database is not available. The first step would be to check if the database is really unavailable by exec’ing into a Pod, it could happen that the connection string of the operator is outdated.
If I have installed a cluster (0.48 apple operator + 6.2.29 fdb) and there is some data in the db, and I want to upgrade it to (1.4.1 apple operator + 6.3.24 fdb), any suggestion about the upgrade steps? Is there any detailed guide? Thanks!
@johscheuer After upgrade operator from 0.48 to 1.4.1 with crd from v1beta1 to v1beta2, I found that some pods are in “Init:0/1” status, and I check the event of those pods and found the following error message:
...
Warning FailedMount 28s (x2 over 4m59s) kubelet Unable to attach or mount volumes: unmounted volumes=[config-map], unattached volumes=[dynamic-conf data fdb-trace-logs fdb-certs config-map]: timed out waiting for the condition
...
I found that the configMap has only 2 items while before upgrade it has 6 items. In the configmap there are only 2 items left:
After I apply the new cluster file with the new fdb image, it still has the same problem. So I think we may need resolve the problem firstly then can go ahead.
@johscheuer After checking the apple operator code I found that there is no imageTypes in foundationdbcluster status after upgrade, while in pure install there is imageTypes in foundationdbcluster status. How to add the imageTypes in foundationdbcluster status after upgrade? Would it be added automatically by apple operator during upgrade? But it seems it has not been added here.
Just for my understanding you performed those steps:
1.) Update CRD to be served on v1beta1 and v1beta2 (e.g. applying the CRD yaml from the fdb-kubernetes-operator repository in a version newer than (or equal) to 1.0)?
2.) After that you upgraded the operator from 0.48 to 1.4.1 (any reason to pick this specific version?)
Could you share the operator logs during the upgrade? There might be some important information. I try to reproduce this issue locally but I won’t have time for this until the end of the week. Could you also ensure that you’re not using any deprecated fields in the FoundationDBCluster resource before upgrading? The kubectl fdb plugin should have a subcommand to help with that.
@johscheuer , thank you so much for your help!
I have tried to update the status of foundationdbcluster to add imageTypes and now configmap issue is gone and all pods are running. But the fdb cluster is still in unavailable status, and it seems the coordinator ips in the following places are all old ones:
/var/fdb/data/fdb.cluster
/var/dynamic-conf/fdb.cluster
fdbcli → status details → coordination servers (all unreachable)
foundationdbcluster status → connectionString
All above 4 places the connectionString are the same, but they are all old ones, the ips in the connectionString did not exist any longer when you check with “oc get po -o wide”, all the pod ips are changed. So we need to double check how to enable the ips in above 4 places to be consistent with those ips which exist in current running pods.
I found there is some error message in fdb pods:
fdbmonitor.log:
...
Time="1662460970.955128" Severity="10" LogGroup="mdm-foundationdb-ibm" Process="fdbmonitor": Updated configuration for fdbserver.1
Time="1662460974.246982" Severity="40" LogGroup="mdm-foundationdb-ibm" Process="fdbserver.1": Warning: FDBD has not joined the cluster after 5 seconds.
Time="1662460974.247006" Severity="40" LogGroup="mdm-foundationdb-ibm" Process="fdbserver.1": Check configuration and availability using the 'status' command with the fdbcli
...
And I also found some error message in operator log:
...
{"level":"error","ts":1662461344.6233137,"logger":"fdbclient","error":"FoundationDB error code 1031 (Operation aborted because the transaction timed out)","stacktrace":"github.com/FoundationDB/fdb-kubernetes-operator/fdbclient.(*cliAdminClient).GetStatus\n\t/workspace/fdbclient/admin_client.go:237\ngithub.com/FoundationDB/fdb-kubernetes-operator/controllers.updateStatus.reconcile\n\t/workspace/controllers/update_status.go:82\ngithub.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBClusterReconciler).Reconcile\n\t/workspace/controllers/cluster_controller.go:169\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214"}
...
And I have found that in pure-install environment, there is such following metric in foundationdbcluster status:
...
hasListenIPsForAllPods: true
...
But in upgrade foundationdbcluster status, there is no such metric.
I believe the operator logs would be more interesting after you do the upgrade to see what errors are logged. The trace files probably don’t have to much information that is in this case interesting since the operator or something seems to remove those entries in the ConfigMap.
@johscheuer Have you tried the upgrade in your environment ( operator 0.48 to 1.4.1 and db from 6.2.29 to 6.3.24)? or Is there any formal upgrade guide for such upgrade? I want to confirm the correct steps. Thanks!