I have successfully managed foundationDB to run on kubernetes with redundancy_mode
single
and all the pods (4
pods) are running fine. Now the first problem is the name of the pods are in below format, as you can say we dont actually get to know which component of foundationDB that pod is
foundationdbcluster-sample-1 2/2 Running 0 15h
foundationdbcluster-sample-2 2/2 Running 0 15h
foundationdbcluster-sample-3 2/2 Running 0 15h
foundationdbcluster-sample-4 2/2 Running 0 15h
another issue that I facing is when I EXEC
into a pod and try to access the fdbcli
I get below error
root@foundationdbcluster-sample-1:/var/dynamic-conf# fdbcli
Using cluster file `/var/dynamic-conf/fdb.cluster'.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
fdb>
Here we can see that the database is unavailable
, and thats why when we try to run another command in fdbcli
we get below error
fdb> writemode on
fdb> set name vivek
WARNING: Long delay (Ctrl-C to interrupt)
The database is unavailable; type `status' for more information.
Now if i go ahead and try to configure the new database using the command configure new ssd single
below is the output that I get
fdb> configure new ssd single
WARNING: Long delay (Ctrl-C to interrupt)
The database is unavailable; type `status' for more information.
My question is how to go about setting up new database. Below is the content of the /var/dynamic-conf/fdb.cluster
root@foundationdbcluster-sample-1:/var/dynamic-conf# more fdb.cluster
foundationdbcluster_sample:56TI2smTyVsGd5Xav0yu99ZAMPTA7z7s@10.244.0.16:4500:tls
root@foundationdbcluster-sample-1:/var/dynamic-conf#
root@foundationdbcluster-sample-1:/var/dynamic-conf#
root@foundationdbcluster-sample-1:/var/dynamic-conf# more fdbmonitor.conf
[general]
kill_on_configuration_change = false
restart_delay = 60
[fdbserver.1]
command = /var/dynamic-conf/bin/6.2.11/fdbserver
cluster_file = /var/fdb/data/fdb.cluster
seed_cluster_file = /var/dynamic-conf/fdb.cluster
public_address = 10.244.0.20:4500:tls
class = storage
datadir = /var/fdb/data
logdir = /var/log/fdb-trace-logs
loggroup = foundationdbcluster-sample
locality_instance_id = 1
locality_machineid = foundationdbcluster-sample-1
locality_zoneid = foundationdbcluster-sample-1
knob_disable_posix_kernel_aio=1
root@foundationdbcluster-sample-1:/var/dynamic-conf#
This seems to work if I create the cluster using double
redundancy_mode
. I am creating the local cluster using this file.
Now the first problem is the name of the pods are in below format, as you can say we dont actually get to know which component of foundationDB that pod is
All of the pods are running fdbserver processes. I’ve been thinking about some changes to the name format to clarify what role they’re serving and make the instance IDs more consistent. I’ll file an issue on that today.
another issue that I facing is when I EXEC
into a pod and try to access the fdbcli
I get below error
The operator should create the initial database configuration for you. If you want to change it, you can run configure ssd single
, but it’s better to do that by editing the spec in Kubernetes and letting the operator roll out the new configuration.
Can you run kubectl describe foundationdbcluster foundationdbcluster-sample
and see if the describe output indicates that the cluster has been created successfully? The status will indicate whether the cluster has been successfully reconciled, and will have events for the major stages of reconciliation, including each attempt to configure the database. It’s possible that the configuration command is failing.
There will also be trace logs in /var/log/fdb-trace-logs
. Those logs may provide more information about connection errors, TLS errors, or something else that’s preventing the configuration from working.
What kind of Kubernetes environment are you using for these tests?
Hi @john_brownlee
I am not pretty sure about the CR status when I created the cluster using redundancy_mode
as single
but now I have changed that to be double and things seem to be working as of now. There are chances that I will have to run this in single redundancy_mode again, and I will update that here if I face any issues.
I am testing these things on a cluster that is provisioned through Kind.
OK. I’ll try reproducing this problem in my local environment. I haven’t done much testing with a single replication config, since it’s not a configuration we use here, but if you have a use case for it we should be able to support it. The only hard blocker with that I’m aware of is that any change that requires recreating pods would take the database unavailable, because we would lose the only coordinator. We could address that by using more coordinators in single replication mode, or by adding an option to handle those kind of updates by replacing the old pods rather than deleting and recreating them.
Hi @john_brownlee
yeah the problem of losing the coordinator makes sense to me now. But as of now I am going ahead with the double redundancy
mode and will update here if I have to get the things working with single redundancy mode
.
Hi @john_brownlee
So, I observed a pattern here, even if we run the cluster in double redundancy mode the pods get the running state but when we exec
into the pod to access the fdbcli
we get below errors for some time.
Either
Unexpected error loading cluster file `/var/dynamic-conf/fdb.cluster': 1513 File could not be read
or
Using cluster file `/var/dynamic-conf/fdb.cluster'.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
and after some time we get database is available
message and are able to access the database
Using cluster file `/var/dynamic-conf/fdb.cluster'.
The database is available.
Welcome to the fdbcli. For help, type `help'.
This doesnt happen in the case if we are running foundationDB
as single redundancy mode
, in that we just keep getting below error message forever
Using cluster file `/var/dynamic-conf/fdb.cluster'.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
fdb>
I hope this would help you debug this issue further.
Are you able to see logs from the operator, or the describe output from kubectl? The first case looks like it hasn’t completed reconciliation.