When I tried to upgrade fdb cluster from 6.2.98 to 6.3.24, finally failed, the following are my steps:
Setup fdb 6.2 environment with apple 0.48 operator
Input some data in fdb database
Remove apple 0.48 operator
Apply v1beta2 CRD
Install apple 1.4.1 operator
Apply new cr with 6.3.24 image
Finally I found some pods with 6.2 image and can be accessible , while some pods are in 6.3 image but can not be acccessible. The pods number is more than initial pod number.
I found some error message in apple operator log:
...
{"level":"info","ts":1666059628.73076,"logger":"controller","msg":"Reconciliation terminated early","namespace":"testoperator1","cluster":"mdm-foundationdb-ibm","subReconciler":"controllers.checkClientCompatibility","requeueAfter":60,"message":"2 clients do not support version 6.3.24: 10.254.13.122:47460:tls (fdb-kubernetes-operator), 10.254.13.122:58172:tls (fdb-kubernetes-operator)"}
...
@johscheuer , I want to clarify that I encountered a strange issue:
Now we have developped another operator based on apple operator, and I have tried the following upgrade steps:
0. Setup apple 0.48 operator + 6.2 fdb cluster + our operator
1. Remove our operator and upgrade our operator
2. Upgrade CRD from v1beta1 to v1beta2
3. Remove apple 0.48 operator and install apple 1.4.1 version operator
4. Apply new cr yaml file with 6.3 fdb version defined.
In this path I can pass the upgrade, and all pods upgraded to 6.3 level and database can be accessiable then.
But if I try the following steps, the upgrade will fail.
0. Setup apple 0.48 operator + 6.2 fdb cluster + our operator
1. Upgrade CRD from v1beta1 to v1beta2
2. Remove our operator and upgrade our operator
3. Remove apple 0.48 operator and install apple 1.4.1 version operator
4. Apply new cr yaml file with 6.3 fdb version defined.
It seems only 1 and 2 steps switched, I wonder why it does not work. As in our test case we have to upgrade apple operator and our operator together, that means we need the second test steps.
@johscheuer Today when we tried the upgrade fdb 6.2 to 6.3 , we encountered the following error message:
...
One or more of the processes in the cluster is incompatible with this version of fdbcli.\n\nSpecified timeout reached -- exiting...\
...
And the following is the env in the controller pod:
env:
...
FDB_NETWORK_OPTION_EXTERNAL_CLIENT_DIRECTORY=/usr/lib/fdb
FDB_BINARY_DIR=/usr/lib/fdb
...
bash-4.4$ ls /usr/lib/fdb
6.2 6.3 libfdb_c_6.2.so libfdb_c_6.3.so
bash-4.4$ ls /usr/lib/fdb/6.2
fdbbackup fdbcli fdbdr fdbmonitor fdbrestore fdbserver
bash-4.4$ ls /usr/lib/fdb/6.3
fdbbackup fdbcli fdbdr fdbmonitor fdbrestore fdbserver
I don’t know why the imcompatible error occurred.
After while (several hours later) we tried to delete the pods which marked with removaltimestamp in foundationdbcluster status manually and later the cluster become healthy and upgrade successfully. But whe we tested it again, (just 30 minutes after upgrade) we deleted the removable pods, but fdb still kept unavailable. So it seems we need wait for a specific time to delete the removable pods. Do you think how long we need to wait or what status we need to check to confirm it is time to delete the removable pods? Thanks!