Upgrade from fdb cluster from 6.2 to 6.3 failed on k8s environment

liyyue · October 18, 2022, 2:21am

When I tried to upgrade fdb cluster from 6.2.98 to 6.3.24, finally failed, the following are my steps:

Setup fdb 6.2 environment with apple 0.48 operator
Input some data in fdb database
Remove apple 0.48 operator
Apply v1beta2 CRD
Install apple 1.4.1 operator
Apply new cr with 6.3.24 image

Finally I found some pods with 6.2 image and can be accessible , while some pods are in 6.3 image but can not be acccessible. The pods number is more than initial pod number.
I found some error message in apple operator log:

...
{"level":"info","ts":1666059628.73076,"logger":"controller","msg":"Reconciliation terminated early","namespace":"testoperator1","cluster":"mdm-foundationdb-ibm","subReconciler":"controllers.checkClientCompatibility","requeueAfter":60,"message":"2 clients do not support version 6.3.24: 10.254.13.122:47460:tls (fdb-kubernetes-operator), 10.254.13.122:58172:tls (fdb-kubernetes-operator)"}
...

johscheuer · October 19, 2022, 4:46am

I believe I already answered this in the other post from you: How to upgrade apple operator 0.48 to 1.4.1 without deleting the old crd? - #29 by johscheuer (or to be precise here: How to upgrade apple operator 0.48 to 1.4.1 without deleting the old crd? - #36 by johscheuer).

Is there any reason you have to upgrade the cluster after you upgraded the operator? Or is this only additional information?

And is there any issue that you observe or is the operator still progressing with the upgrade and those errors show up only for a short time?

liyyue · October 19, 2022, 8:45am

@johscheuer , I want to clarify that I encountered a strange issue:
Now we have developped another operator based on apple operator, and I have tried the following upgrade steps:

0. Setup apple 0.48 operator + 6.2 fdb cluster + our operator 
1. Remove our operator and upgrade our operator
2. Upgrade CRD from v1beta1 to v1beta2
3. Remove apple 0.48 operator and install apple 1.4.1 version operator
4. Apply new cr yaml file with 6.3 fdb version defined.

In this path I can pass the upgrade, and all pods upgraded to 6.3 level and database can be accessiable then.

But if I try the following steps, the upgrade will fail.

0. Setup apple 0.48 operator + 6.2 fdb cluster + our operator 
1. Upgrade CRD from v1beta1 to v1beta2
2. Remove our operator and upgrade our operator
3. Remove apple 0.48 operator and install apple 1.4.1 version operator
4. Apply new cr yaml file with 6.3 fdb version defined.

It seems only 1 and 2 steps switched, I wonder why it does not work. As in our test case we have to upgrade apple operator and our operator together, that means we need the second test steps.

liyyue · October 19, 2022, 8:56am

For Test 2 path:
Initial dockerfile setting:

...
COPY --from=fdb62 /usr/bin/fdb* /usr/bin/fdb/6.2/
COPY --from=fdb63 /usr/bin/fdb* /usr/bin/fdb/6.3/
COPY --from=fdb63 /usr/lib/libfdb_c.so /usr/lib/fdb/libfdb_c_6.3.so

ENV FDB_BINARY_DIR=/usr/bin/fdb
ENV FDB_NETWORK_OPTION_EXTERNAL_CLIENT_DIRECTORY=/usr/lib/fdb
...

But encounter following error

...
"FoundationDB error code 1031 (Operation aborted because the transaction timed out)"
...

Then I changed the dockfile as following:

...

COPY --from=fdb62 /usr/bin/fdb* /usr/bin/fdb/6.2/
COPY --from=fdb62 /usr/lib/libfdb_c.so /usr/lib/fdb/libfdb_c_6.2.so
COPY --from=fdb63 /usr/bin/fdb* /usr/bin/fdb/6.3/
COPY --from=fdb63 /usr/lib/libfdb_c.so /usr/lib/fdb/libfdb_c_6.3.so

ENV FDB_BINARY_DIR=/usr/bin/fdb
ENV FDB_NETWORK_OPTION_EXTERNAL_CLIENT_DIRECTORY=/usr/lib/fdb
...

But encountered following error:

...
2 clients do not support version 6.3.24: 10.254.13.122:47460:tls (fdb-kubernetes-operator), 10.254.13.122:58172:tls (fdb-kubernetes-operator)"
...

But if I use the test 1 steps with initial dockefile configuration, it can pass upgrade.

liyyue · October 19, 2022, 9:01am

If I encouter above error messages, the upgrade will be failed, even I wait for 1 hour there is no progress for upgrade.

liyyue · November 8, 2022, 8:56am

@johscheuer Today when we tried the upgrade fdb 6.2 to 6.3 , we encountered the following error message:

 ...
 One or more of the processes in the cluster is incompatible with this version of fdbcli.\n\nSpecified timeout reached -- exiting...\
 ...

And the following is the env in the controller pod:

env:
...
FDB_NETWORK_OPTION_EXTERNAL_CLIENT_DIRECTORY=/usr/lib/fdb
FDB_BINARY_DIR=/usr/lib/fdb
...	 
bash-4.4$ ls /usr/lib/fdb
6.2  6.3  libfdb_c_6.2.so  libfdb_c_6.3.so
bash-4.4$ ls /usr/lib/fdb/6.2
fdbbackup  fdbcli  fdbdr  fdbmonitor  fdbrestore  fdbserver
bash-4.4$ ls /usr/lib/fdb/6.3
fdbbackup  fdbcli  fdbdr  fdbmonitor  fdbrestore  fdbserver

I don’t know why the imcompatible error occurred.

After while (several hours later) we tried to delete the pods which marked with removaltimestamp in foundationdbcluster status manually and later the cluster become healthy and upgrade successfully. But whe we tested it again, (just 30 minutes after upgrade) we deleted the removable pods, but fdb still kept unavailable. So it seems we need wait for a specific time to delete the removable pods. Do you think how long we need to wait or what status we need to check to confirm it is time to delete the removable pods? Thanks!

Topic		Replies	Views
How to upgrade apple operator 0.48 to 1.4.1 without deleting the old crd? Kubernetes Operator operator	41	1179	October 13, 2022
How to upgrade from (0.48 apple operator + 6.2.29 fdb) to (1.4.1 apple operator + 6.3.24 fdb) in k8s/openshift environment? Kubernetes Operator fdbsummit , performance , operator	1	346	September 3, 2022
FDB Cluster upgrade does not deletes old pods Kubernetes Operator	1	208	July 24, 2023
What's the compatible version of foundationdb server and operator version? Kubernetes Operator	5	480	July 5, 2022
ConfigMap become 2 items after upgrade( before upgrade it has 6 items) Kubernetes Operator performance , operator	1	526	September 6, 2022

Upgrade from fdb cluster from 6.2 to 6.3 failed on k8s environment

Related topics