I’m looking for some guidance on using the DNS option for the cluster file connection string. I’m able to deploy the cluster fine when using pod or service IPs. I have tried this set up with multiple image versions of the operator/fdb/sidecar from 7.1.25 to 7.2.0 which was released yesterday, all with the same results. Between attempts I am deleting the operator deployment and the fdb cluster.
When I enable useDNSInClusterFile
, the operator will create the cluster, pick coordinators, and initialize the cluster file with the what I expect for the connection string. However the operator errors after that with FoundationDB error code 2104 (Connection string invalid)
.
...
{"level":"info","ts":1669932412.0034533,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"foundationdb","cluster":"foundationdb-cluster","subReconciler":"controllers.updateLabels"}
{"level":"info","ts":1669932412.0045478,"logger":"controller","msg":"Attempting to run sub-reconciler","namespace":"foundationdb","cluster":"foundationdb-cluster","subReconciler":"controllers.updateDatabaseConfiguration"}
{"level":"info","ts":1669932412.0046866,"logger":"fdbclient","msg":"Fetch values from FDB","namespace":"foundationdb","cluster":"foundationdb-cluster","key":"\ufffd\ufffd/status/json"}
{"level":"info","ts":1669932412.0048652,"logger":"fdbclient","msg":"Done fetching values from FDB","namespace":"foundationdb","cluster":"foundationdb-cluster","key":"\ufffd\ufffd/status/json"}
{"level":"error","ts":1669932412.0050175,"logger":"controller","msg":"Error in reconciliation","namespace":"foundationdb","cluster":"foundationdb-cluster","subReconciler":"controllers.updateDatabaseConfiguration","requeueAfter":0,"error":"FoundationDB error code 2104 (Connection string invalid)","stacktrace":"github.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBClusterReconciler).Reconcile\n\t/workspace/controllers/cluster_controller.go:180\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234"}
{"level":"error","ts":1669932412.005133,"msg":"Reconciler error","controller":"foundationdbcluster","controllerGroup":"apps.foundationdb.org","controllerKind":"FoundationDBCluster","foundationDBCluster":{"name":"foundationdb-cluster","namespace":"foundationdb"},"namespace":"foundationdb","name":"foundationdb-cluster","reconcileID":"31fb36e8-85f9-4d36-b3ac-a93c1c0d3a88","error":"FoundationDB error code 2104 (Connection string invalid)","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234"}
The cluster-config generated:
cluster-file: >-
foundationdb_cluster:LNg3SuB6Knl88bYxNJ5scY2gkDXr1fsp@foundationdb-cluster-storage-1.foundationdb-cluster.foundationdb.svc.cluster.local:4501,foundationdb-cluster-storage-2.foundationdb-cluster.foundationdb.svc.cluster.local:4501,foundationdb-cluster-storage-3.foundationdb-cluster.foundationdb.svc.cluster.local:4501
fdbmonitor-conf-cluster_controller: |-
[general]
kill_on_configuration_change = false
restart_delay = 60
[fdbserver.1]
command = $BINARY_DIR/fdbserver
cluster_file = /var/fdb/data/fdb.cluster
seed_cluster_file = /var/dynamic-conf/fdb.cluster
public_address = $FDB_PUBLIC_IP:4501
class = cluster_controller
logdir = /var/log/fdb-trace-logs
loggroup = foundationdb-cluster
datadir = /var/fdb/data
locality_instance_id = $FDB_INSTANCE_ID
locality_machineid = $FDB_MACHINE_ID
locality_zoneid = $FDB_ZONE_ID
listen_address = $FDB_POD_IP:4501
knob_disable_posix_kernel_aio = 1
locality_dns_name = $FDB_DNS_NAME
fdbmonitor-conf-log: |-
[general]
kill_on_configuration_change = false
restart_delay = 60
[fdbserver.1]
command = $BINARY_DIR/fdbserver
cluster_file = /var/fdb/data/fdb.cluster
seed_cluster_file = /var/dynamic-conf/fdb.cluster
public_address = $FDB_PUBLIC_IP:4501
class = log
logdir = /var/log/fdb-trace-logs
loggroup = foundationdb-cluster
datadir = /var/fdb/data
locality_instance_id = $FDB_INSTANCE_ID
locality_machineid = $FDB_MACHINE_ID
locality_zoneid = $FDB_ZONE_ID
listen_address = $FDB_POD_IP:4501
knob_disable_posix_kernel_aio = 1
locality_dns_name = $FDB_DNS_NAME
fdbmonitor-conf-storage: |-
[general]
kill_on_configuration_change = false
restart_delay = 60
[fdbserver.1]
command = $BINARY_DIR/fdbserver
cluster_file = /var/fdb/data/fdb.cluster
seed_cluster_file = /var/dynamic-conf/fdb.cluster
public_address = $FDB_PUBLIC_IP:4501
class = storage
logdir = /var/log/fdb-trace-logs
loggroup = foundationdb-cluster
datadir = /var/fdb/data
locality_instance_id = $FDB_INSTANCE_ID
locality_machineid = $FDB_MACHINE_ID
locality_zoneid = $FDB_ZONE_ID
listen_address = $FDB_POD_IP:4501
knob_disable_posix_kernel_aio = 1
locality_dns_name = $FDB_DNS_NAME
running-version: 7.2.0
If I exec into one of the pods and check the status, I get the following:
Using cluster file `/var/dynamic-conf/fdb.cluster'.
Unable to communicate with the cluster controller at 172.20.179.0:4501 to get
status.
Configuration:
Redundancy mode - unknown
Storage engine - unknown
Encryption at-rest - disabled
Coordinators - unknown
Usable Regions - unknown
Cluster:
FoundationDB processes - unknown
Zones - unknown
Machines -
Machines - unknown
Data:
Replication health - unknown
Moving data - unknown
Sum of key-value sizes - unknown
Disk space used - unknown
Operating space:
Unable to retrieve operating space status
Workload:
Read rate - unknown
Write rate - unknown
Transactions started - unknown
Transactions committed - unknown
Conflict rate - unknown
Backup and DR:
Running backups - 0
Running DRs - 0
Execing into the coordinator gives similar results:
WARNING: Long delay (Ctrl-C to interrupt)
Using cluster file `/var/dynamic-conf/fdb.cluster'.
The coordinator(s) have no record of this database. Either the coordinator
addresses are incorrect, the coordination state on those machines is missing, or
no database has been created.
foundationdb-cluster-storage-1.foundationdb-cluster.foundationdb.svc.cluster.local:4501 (reachable)
foundationdb-cluster-storage-2.foundationdb-cluster.foundationdb.svc.cluster.local:4501 (reachable)
foundationdb-cluster-storage-3.foundationdb-cluster.foundationdb.svc.cluster.local:4501 (reachable)
Unable to locate the data distributor worker.
Unable to locate the ratekeeper worker.
Unable to locate the consistencyScan worker.
I am running the FDB cluster on EKS via the operator with the following configuration:
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
labels:
argocd.argoproj.io/instance: foundationdb.foundationdb.in-cluster
name: foundationdb-cluster
namespace: foundationdb
spec:
automationOptions:
killProcesses: true
replacements:
enabled: true
maxConcurrentReplacements: 2
faultDomain:
key: foundationdb.org/none
labels:
filterOnOwnerReference: false
matchLabels:
foundationdb.org/fdb-cluster-name: foundationdb-cluster
processClassLabels:
- foundationdb.org/fdb-process-class
processGroupIDLabels:
- foundationdb.org/fdb-process-group-id
mainContainer:
imageConfigs:
- baseImage: foundationdb/foundationdb
- tag: 7.2.0
minimumUptimeSecondsForBounce: 60
processCounts:
cluster_controller: 1
stateless: -1
processes:
general:
customParameters:
- knob_disable_posix_kernel_aio=1
podTemplate:
spec:
containers:
- name: foundationdb
resources:
requests:
cpu: 400m
memory: 955Mi
securityContext:
runAsUser: 0
- livenessProbe:
failureThreshold: 5
initialDelaySeconds: 15
periodSeconds: 30
successThreshold: 1
tcpSocket:
port: 8080
timeoutSeconds: 1
name: foundationdb-kubernetes-sidecar
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
initContainers:
- name: foundationdb-kubernetes-init
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
volumeClaimTemplate:
spec:
resources:
requests:
storage: 16G
replaceInstancesWhenResourcesChange: false
routing:
publicIPSource: service
useDNSInClusterFile: true
sidecarContainer:
enableLivenessProbe: false
enableReadinessProbe: false
useExplicitListenAddress: true
version: 7.2.0
Notably, I have the routing block as
routing:
publicIPSource: service
useDNSInClusterFile: true
Documentation is very light on this feature, so I don’t know if I’m missing a configuration option or what. I found this test case which helped a little. I checked the validation in FoundationDB itself for the hostname and the regex passes, so hostnames themselves should be valid. I’ve read just about every forum post and issue that mentions the DNS option.
My ultimate use case is to connect to the FDB cluster from another peered cluster via Cilium cluster mesh. I can do that with the pod IPs, but I need the predicable service names for this solution to be viable.
Does anyone have any suggestions? I’ve been trying to figure this out for a while now and I’m out of ideas. Thanks.