Upgrade operator from pod ip to serivce ip failed

Dear experts:
We are upgrading operator from 0.48 to 1.4.1, before upgrade,

    routing:
       publicIPSource: pod

after upgrade, we set to use service by default. sometimes, will get this error

{"level":"error","ts":1676000796.1141114,"logger":"controller","msg":"Error in reconciliation","namespace":"cpd461","cluster":"mdm-foundationdb-1675996842503580","subReconciler":"controllers.addPods","requeueAfter":0,"error":"Service \"mdm-foundationdb-1675996842503580-cluster-controller-1\" not found","stacktrace":"github.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBClusterReconciler).Reconcile\n\t/workspace/controllers/cluster_controller.go:183\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214"}

check svc now, it only has service called mdm-foundationdb-1675996842503580-cluster-controller-2, but there is no mdm-foundationdb-1675996842503580-cluster-controller-1 service.
check foundationdbcluster status (only paste problematic part)

  processGroups:
  - addresses:
    - 10.254.20.175
    processClass: cluster_controller
    processGroupConditions:
    - timestamp: 1676000647
      type: ResourcesTerminating
    processGroupID: cluster_controller-1
    removalTimestamp: "2023-02-10T03:44:04Z"
  - addresses:
    - 172.30.201.194
    processClass: cluster_controller
    processGroupConditions:
    - timestamp: 1676000649
      type: SidecarUnreachable
    - timestamp: 1676000683
      type: MissingPod
    processGroupID: cluster_controller-2

why it is still checking old pod’s svc? is this a bug? how to bypass this error?

It’s not a bug, but it’s a bug that the service is not recreated. Have you manually deleted the service? The code has the reason: fdb-kubernetes-operator/add_pods.go at main · FoundationDB/fdb-kubernetes-operator · GitHub (the operator will try to recreate the Pod). This check fdb-kubernetes-operator/add_services.go at main · FoundationDB/fdb-kubernetes-operator · GitHub has to to corrected to be the same as fdb-kubernetes-operator/add_pods.go at main · FoundationDB/fdb-kubernetes-operator · GitHub.

Do you mean the service for the old pod mdm-foundationdb-1675996842503580-cluster-controller-1 should be created by the operator? but the old pod actually is using pod ip, not service ip. you mean the operator will create the service for mdm-foundationdb-1675996842503580-cluster-controller-1 and delete it once old pod is deleted?
I didn’t manually delete any service.

For the current code , you mean we should add the same check (&& processGroup.IsExcluded()) for add_services.go?

@johscheuer it sounds like I don’t have the authority to raise pr or open an issue under Pull requests · FoundationDB/fdb-kubernetes-operator · GitHub
thorn126:liuyan-upgrade-from-podip-to-serviceip

@johscheuer it sounds like I don’t have the authority to raise pr or open an issue under Pull requests · FoundationDB/fdb-kubernetes-operator · GitHub
thorn126:liuyan-upgrade-from-podip-to-serviceip

I believe @ammolitor is working on that.