Cannot enable TLS without DISABLE_SIDECAR_TLS_CHECK=1

Hi all,
I tried to deploy FDB cluster with TLS enable with document https://github.com/FoundationDB/fdb-kubernetes-operator/blob/master/docs/manual/tls.md

If I do not DISABLE_SIDECAR_TLS_CHECK=1, it will always send error

{
  "level": "error",
  "ts": 1626526429.2545986,
  "logger": "controller-runtime.manager.controller.foundationdbcluster",
  "msg": "Reconciler error",
  "reconciler group": "apps.foundationdb.org",
  "reconciler kind": "FoundationDBCluster",
  "name": "fdb-cluster",
  "namespace": "default",
  "error": "GET https://10.42.2.14:8080/substitutions giving up after 11 attempt(s): Get \"https://10.42.2.14:8080/substitutions\": x509: cannot validate certificate for 10.42.2.14 because it doesn't contain any IP SANs",
  "stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:302\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:99"
}

Could you help me solve this?

"https://10.42.2.14:8080/substitutions: x509: cannot validate certificate for 10.42.2.14 because it doesn't contain any IP SANs" tells you the issue. If you wanted to run a TLS cluster without DISABLE_SIDECAR_TLS_CHECK you have to provide valid TLS certificates that match the IP address (currently the operator will use the IP address to connect to the sidecar). That’s also mentioned in the linked manual: fdb-kubernetes-operator/tls.md at master · FoundationDB/fdb-kubernetes-operator · GitHub. This is currently a limitation how TLS is handled we have a planned feature to make this more flexible: Custom validation for TLS connections to the sidecar · Issue #756 · FoundationDB/fdb-kubernetes-operator · GitHub. If you need this feature feel free to work on it and file a PR.

Hi @johscheuer
Thank for your answer. Can you give me one example about peer verification?
I do some but it not working for peer, here is my config:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: fdb-cluster-certificate
namespace: default
spec:
dnsNames:
- “*.fdb-cluster.default.svc.cluster.local”
secretName: fdb-cluster-certificate
issuerRef:
name: fdb-cluster-issuer
kind: Issuer

routing:
publicIPSource: “service”
mainContainer:
enableTls: true
peerVerificationRules: “subjectAltName.DNS>=fdb-cluster.default.svc.cluster.local”
sidecarContainer:
enableTls: true
peerVerificationRules: “subjectAltName.DNS>=fdb-cluster.default.svc.cluster.local”

The official docs have some examples: Transport Layer Security — FoundationDB 6.3 if anything is missing please let me know or update the docs with the missing pieces. You can also look at the trace logs in the FoundationDB containers to see what’s the issue with the verification rule (or rather why the certificate is not matching). Depending on how cert-manager generates the certificates they might be seen as self-signed rather then “official” certificates.

Hi @johscheuer
I did the same with official document and peerVerificationRule work with sidecar, not for maincontainer
Here is my config:

mainContainer:
enableTls: true
peerVerificationRules: “S.subjectAltName<=DNS:.fdb-cluster.default.svc.cluster.local”
sidecarContainer:
enableTls: true
peerVerificationRules: “S.subjectAltName<=DNS:.fdb-cluster.default.svc.cluster.local”

Could you take a look in the FDB trace logs for any TLS issues? I assume that’s an issue with self-signed certificates. If you root CA is a self-signed cert it must contain the AuthorityKeyId that points to itself otherwise FDB will reject a self-signed certificate as root cert.

HI,
I tried the rules as well as adding the AuthorityKeyID in the root certificate. In both cases, I get this error:
{“level”:“error”,“ts”:1645030087.361136,“logger”:“controller”,“msg”:“Error deserializing pod substitutions”,“responseBody”:"\n\n \n <meta http-equiv=“Content-Type” content=“text/html;charset=utf-8”>\n Error response\n \n \n

Error response

\n

Error code: 401

\n

Message: Client certificate was not approved.

\n

Error code explanation: 401 - No permission – see authorization schemes.

\n \n\n",“error”:“invalid character ‘<’ looking for beginning of value”,“stacktrace”:“github.com/FoundationDB/fdb-kubernetes-operator/internal.(*realFdbPodClient).GetVariableSubstitutions\n\t/workspace/internal/pod_client.go:216\Ngithub.com).Reconcile\n\t/workspace/controllers/cluster_controller.go:155\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214”}
Any idea what that is about?

This error comes from one of the sidecars and mean that the provided cert by the operator doesn’t match the provided peerVerificationRules for the sidecarContainer.

The rule I used is peerVerificationRules: “Check.Valid=1”, wouldn’t that would match anything?