Upgrade from fdb cluster from 6.2 to 6.3 failed on k8s environment

When I tried to upgrade fdb cluster from 6.2.98 to 6.3.24, finally failed, the following are my steps:

  1. Setup fdb 6.2 environment with apple 0.48 operator
  2. Input some data in fdb database
  3. Remove apple 0.48 operator
  4. Apply v1beta2 CRD
  5. Install apple 1.4.1 operator
  6. Apply new cr with 6.3.24 image

Finally I found some pods with 6.2 image and can be accessible , while some pods are in 6.3 image but can not be acccessible. The pods number is more than initial pod number.
I found some error message in apple operator log:

...
{"level":"info","ts":1666059628.73076,"logger":"controller","msg":"Reconciliation terminated early","namespace":"testoperator1","cluster":"mdm-foundationdb-ibm","subReconciler":"controllers.checkClientCompatibility","requeueAfter":60,"message":"2 clients do not support version 6.3.24: 10.254.13.122:47460:tls (fdb-kubernetes-operator), 10.254.13.122:58172:tls (fdb-kubernetes-operator)"}
...

I believe I already answered this in the other post from you: How to upgrade apple operator 0.48 to 1.4.1 without deleting the old crd? - #29 by johscheuer (or to be precise here: How to upgrade apple operator 0.48 to 1.4.1 without deleting the old crd? - #36 by johscheuer).

Is there any reason you have to upgrade the cluster after you upgraded the operator? Or is this only additional information?

And is there any issue that you observe or is the operator still progressing with the upgrade and those errors show up only for a short time?

@johscheuer , I want to clarify that I encountered a strange issue:
Now we have developped another operator based on apple operator, and I have tried the following upgrade steps:

0. Setup apple 0.48 operator + 6.2 fdb cluster + our operator 
1. Remove our operator and upgrade our operator
2. Upgrade CRD from v1beta1 to v1beta2
3. Remove apple 0.48 operator and install apple 1.4.1 version operator
4. Apply new cr yaml file with 6.3 fdb version defined.

In this path I can pass the upgrade, and all pods upgraded to 6.3 level and database can be accessiable then.

But if I try the following steps, the upgrade will fail.

0. Setup apple 0.48 operator + 6.2 fdb cluster + our operator 
1. Upgrade CRD from v1beta1 to v1beta2
2. Remove our operator and upgrade our operator
3. Remove apple 0.48 operator and install apple 1.4.1 version operator
4. Apply new cr yaml file with 6.3 fdb version defined.

It seems only 1 and 2 steps switched, I wonder why it does not work. As in our test case we have to upgrade apple operator and our operator together, that means we need the second test steps.

For Test 2 path:
Initial dockerfile setting:

...
COPY --from=fdb62 /usr/bin/fdb* /usr/bin/fdb/6.2/
COPY --from=fdb63 /usr/bin/fdb* /usr/bin/fdb/6.3/
COPY --from=fdb63 /usr/lib/libfdb_c.so /usr/lib/fdb/libfdb_c_6.3.so

ENV FDB_BINARY_DIR=/usr/bin/fdb
ENV FDB_NETWORK_OPTION_EXTERNAL_CLIENT_DIRECTORY=/usr/lib/fdb
...

But encounter following error

...
"FoundationDB error code 1031 (Operation aborted because the transaction timed out)"
...

Then I changed the dockfile as following:

...

COPY --from=fdb62 /usr/bin/fdb* /usr/bin/fdb/6.2/
COPY --from=fdb62 /usr/lib/libfdb_c.so /usr/lib/fdb/libfdb_c_6.2.so
COPY --from=fdb63 /usr/bin/fdb* /usr/bin/fdb/6.3/
COPY --from=fdb63 /usr/lib/libfdb_c.so /usr/lib/fdb/libfdb_c_6.3.so

ENV FDB_BINARY_DIR=/usr/bin/fdb
ENV FDB_NETWORK_OPTION_EXTERNAL_CLIENT_DIRECTORY=/usr/lib/fdb
...

But encountered following error:

...
2 clients do not support version 6.3.24: 10.254.13.122:47460:tls (fdb-kubernetes-operator), 10.254.13.122:58172:tls (fdb-kubernetes-operator)"
...

But if I use the test 1 steps with initial dockefile configuration, it can pass upgrade.

If I encouter above error messages, the upgrade will be failed, even I wait for 1 hour there is no progress for upgrade.

@johscheuer Today when we tried the upgrade fdb 6.2 to 6.3 , we encountered the following error message:

 ...
 One or more of the processes in the cluster is incompatible with this version of fdbcli.\n\nSpecified timeout reached -- exiting...\
 ...

And the following is the env in the controller pod:

env:
...
FDB_NETWORK_OPTION_EXTERNAL_CLIENT_DIRECTORY=/usr/lib/fdb
FDB_BINARY_DIR=/usr/lib/fdb
...	 
bash-4.4$ ls /usr/lib/fdb
6.2  6.3  libfdb_c_6.2.so  libfdb_c_6.3.so
bash-4.4$ ls /usr/lib/fdb/6.2
fdbbackup  fdbcli  fdbdr  fdbmonitor  fdbrestore  fdbserver
bash-4.4$ ls /usr/lib/fdb/6.3
fdbbackup  fdbcli  fdbdr  fdbmonitor  fdbrestore  fdbserver

I don’t know why the imcompatible error occurred.

After while (several hours later) we tried to delete the pods which marked with removaltimestamp in foundationdbcluster status manually and later the cluster become healthy and upgrade successfully. But whe we tested it again, (just 30 minutes after upgrade) we deleted the removable pods, but fdb still kept unavailable. So it seems we need wait for a specific time to delete the removable pods. Do you think how long we need to wait or what status we need to check to confirm it is time to delete the removable pods? Thanks!