Getting timeout error while reconciling foundationDB cluster in kubernetes

antonprabin · June 1, 2020, 10:44am

Hello team,

I am getting timeout errors while deploying fdb operator in kubernetes 16.10. Below is the error:-

====
2020-06-01T10:30:24.994Z ERROR controller-runtime.controller Reconciler error {“controller”: “foundationdbcluster”, “request”: “default/sample-cluster”, “error”: “Get http://10.244.1.21:8080/substitutions: dial tcp 10.244.1.21:8080: i/o timeout”}

Please see the related config data below:-

====
01:~/yamls$ kubectl get pods
NAME READY STATUS RESTARTS AGE
fdb-kubernetes-operator-controller-manager-665466bc85-6l8xc 1/1 Running 214 6d2h
sample-cluster-log-1 2/2 Running 0 8m34s
sample-cluster-log-2 2/2 Running 0 8m33s
sample-cluster-log-3 2/2 Running 0 8m33s
sample-cluster-stateless-1 2/2 Running 0 8m34s
sample-cluster-stateless-2 2/2 Running 0 8m34s
sample-cluster-stateless-3 2/2 Running 0 8m34s
sample-cluster-storage-1 2/2 Running 0 8m34s
sample-cluster-storage-2 2/2 Running 0 8m34s
sample-cluster-storage-3 2/2 Running 0 8m34s

-01:~/yamls$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
fdb-pv-volume 5Gi RWO Retain Available standard 17m
sample-cluster-pv1 130Gi RWO Retain Bound default/sample-cluster-log-1-data 42m
sample-cluster-pv10 130Gi RWO Retain Bound default/sample-cluster-storage-1-data 39m
sample-cluster-pv11 130Gi RWO Retain Available 27m
sample-cluster-pv12 130Gi RWO Retain Available 27m
sample-cluster-pv13 130Gi RWO Retain Bound default/sample-cluster-storage-2-data 26m
sample-cluster-pv14 130Gi RWO Retain Available 26m
sample-cluster-pv15 130Gi RWO Retain Available 26m
sample-cluster-pv2 130Gi RWO Retain Bound default/sample-cluster-log-2-data 42m
sample-cluster-pv3 130Gi RWO Retain Bound default/sample-cluster-storage-3-data 41m
sample-cluster-pv4 130Gi RWO Retain Available 41m
sample-cluster-pv5 130Gi RWO Retain Available 41m
sample-cluster-pv6 130Gi RWO Retain Bound default/sample-cluster-log-3-data 40m
sample-cluster-pv8 130Gi RWO Retain Available 40m
sample-cluster-pv9 130Gi RWO Retain Available 39m

-01:~/yamls$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
sample-cluster-log-1-data Bound sample-cluster-pv1 130Gi RWO 8m48s
sample-cluster-log-2-data Bound sample-cluster-pv2 130Gi RWO 8m48s
sample-cluster-log-3-data Bound sample-cluster-pv6 130Gi RWO 8m47s
sample-cluster-storage-1-data Bound sample-cluster-pv10 130Gi RWO 8m48s
sample-cluster-storage-2-data Bound sample-cluster-pv13 130Gi RWO 8m48s
sample-cluster-storage-3-data Bound sample-cluster-pv3 130Gi RWO 8m48s

-01:~/yamls$ kubectl get foundationdbcluster
NAME GENERATION RECONCILED HEALTHY
sample-cluster 3
-01:~/yamls$

====

I am new to foundationdb and operator setup in k8s. I would like to get some inputs on this. I am not sure if I am doing something wrong here.

Regards,
Prabin

john_brownlee · June 1, 2020, 4:46pm

That API call is made by the operator to the sidecar process in order to get the instance-specific information that will be substituted into the start command. If that’s timing out, it suggests there’s a network problem or ACL problem between the operator and the FDB pod.

Topic		Replies	Views
Fdb-doc-layer on kubernets Document Layer operator	1	560	December 2, 2021
Cluster stuck in recovery Running FoundationDB	3	686	March 12, 2021
Recovering from FoundationDB crashes Kubernetes Operator operator	5	900	August 24, 2021
FDB K8s Operator stuck after FDB 7 migration Kubernetes Operator operator	4	297	July 7, 2023
FDB operator stuck without recreating pods Kubernetes Operator operator	4	381	February 22, 2024

Getting timeout error while reconciling foundationDB cluster in kubernetes

==== 2020-06-01T10:30:24.994Z ERROR controller-runtime.controller Reconciler error {“controller”: “foundationdbcluster”, “request”: “default/sample-cluster”, “error”: “Get http://10.244.1.21:8080/substitutions: dial tcp 10.244.1.21:8080: i/o timeout”}

Related topics

====
2020-06-01T10:30:24.994Z ERROR controller-runtime.controller Reconciler error {“controller”: “foundationdbcluster”, “request”: “default/sample-cluster”, “error”: “Get http://10.244.1.21:8080/substitutions: dial tcp 10.244.1.21:8080: i/o timeout”}