Deployment with custom fdb.cluster file

Hi,

I’m trying to deploy a new cluster that is configured by the provisioning tool (ansible) instead of fdbcli. However once I create fdb.cluster file manually, the client is refusing to connect and as reason returning The database is unavailable;

Here is my setup.

Server, 16GB ram, 4 cores. I have 4 processes running as

[fdbserver.4500]
[fdbserver.4501]
[fdbserver.4502]
[fdbserver.4503]

and the IP is

public_address = auto:$ID
listen_address = public

My fdb.cluster file looks like this
localtest:localtest@0.0.0.0:4500

I tried any variations of the cluster file, but nothing is working.

Additionally, do I have to put also the three other processes into the cluster file, or this is discovered on each server directly? Although not sure how would the client know.

Thanks for any pointers

The address listed in the cluster file needs to match the public address exactly. If public is replaced with 10.1.2.3:4500, then 0.0.0.0:4500 won’t match. Every time I’ve run into this, it’s been from me accidentally writing a wrong IP address (127.0.0.1 vs 0.0.0.0 vs public IP) in the cluster file versus what the process is actually configured as listening on.

If you post your exact fdbserver process invocation line(s), then I can offer more concrete advice.

Each process upon startup will try to connect to the processes listed in the cluster file, and can learn about each other then and self-organize. Using just one coodinator would work, but means that if that one process dies, then your cluster would be unavailable. For local testing, that’s probably fine. For production, you’d probably want 3 or more coordinators.

Appreciate the pointers. I think I’m getting closer, but still haven’t managed to get it to work.

$ ps aux | grep fdb
root     14398  0.0  0.0  20476  2392 ?        Ss   00:51   0:00 /usr/lib/foundationdb/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdbmonitor.pid --daemonize
foundat+ 14399  0.3  0.0 108164  9488 ?        Sl   00:51   0:00 /usr/lib/foundationdb/backup_agent/backup_agent --cluster_file /etc/foundationdb/fdb.cluster --logdir /var/log/foundationdb
foundat+ 14400  0.4  0.1 254476 17404 ?        Sl   00:51   0:00 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4500 --listen_address public --logdir /var/log/foundationdb --logsize 50MiB --machine_id instance-1 --maxlogssize 1GiB --memory 13GiB --public_address auto:4500 --storage_memory 1GiB
foundat+ 14401  0.9  0.1 122956 18040 ?        Sl   00:51   0:00 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4501 --listen_address public --logdir /var/log/foundationdb --logsize 50MiB --machine_id instance-1 --maxlogssize 1GiB --memory 13GiB --public_address auto:4501 --storage_memory 1GiB
foundat+ 14402  0.3  0.0 122748 14736 ?        Sl   00:51   0:00 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4502 --listen_address public --logdir /var/log/foundationdb --logsize 50MiB --machine_id instance-1 --maxlogssize 1GiB --memory 13GiB --public_address auto:4502 --storage_memory 1GiB
foundat+ 14404  0.3  0.0 122744 14688 ?        Sl   00:51   0:00 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4503 --listen_address public --logdir /var/log/foundationdb --logsize 50MiB --machine_id instance-1 --maxlogssize 1GiB --memory 13GiB --public_address auto:4503 --storage_memory 1GiB

I also verified that FDB is running on what is it’s running.

$ sudo netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 10.240.2.222:4500       0.0.0.0:*               LISTEN      14400/fdbserver
tcp        0      0 10.240.2.222:4501       0.0.0.0:*               LISTEN      14401/fdbserver
tcp        0      0 10.240.2.222:4502       0.0.0.0:*               LISTEN      14402/fdbserver
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      748/sshd
tcp        0      0 10.240.2.222:4503       0.0.0.0:*               LISTEN      14404/fdbserver
tcp6       0      0 :::22                   :::*                    LISTEN      748/sshd

My new fdb.cluster is
localtest:localtest@10.240.2.222:4500

And the client is still not willing to connect

$ fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> status

Using cluster file `/etc/foundationdb/fdb.cluster'.

The coordinator(s) have no record of this database. Either the coordinator
addresses are incorrect, the coordination state on those machines is missing, or
no database has been created.

  10.240.2.222:4500  (reachable)

What am I missing?

Checking the code, using “auto” means that we’ll try to connect to coordinator[0], and then use the local IP of that connection as your public address. Which, also means that this is probably not the most reliable way to specify an IP to match in a cluster file.

For the sake of sanity, can you just use 127.0.0.1 instead of auto as your public address, and verify that works? We can then move onto how to get things to work with auto.

I did, but still doesn’t work.

FDB server

$ ps aux | grep fdb
root     11618  0.0  0.0  20476  2588 ?        Ss   20:36   0:00 /usr/lib/foundationdb/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdbmonitor.pid --daemonize
foundat+ 11620  0.1  0.0 108164  9148 ?        Sl   20:36   0:00 /usr/lib/foundationdb/backup_agent/backup_agent --cluster_file /etc/foundationdb/fdb.cluster --logdir /var/log/foundationdb
foundat+ 11744  0.7  0.1 188588 17580 ?        Sl   20:37   0:00 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4502 --listen_address public --logdir /var/log/foundationdb --logsize 50MiB --machine_id instance-1 --maxlogssize 1GiB --memory 13GiB --public_address 127.0.0.1:4502 --storage_memory 1GiB
foundat+ 11745  0.3  0.1 188612 17344 ?        Sl   20:37   0:00 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4503 --listen_address public --logdir /var/log/foundationdb --logsize 50MiB --machine_id instance-1 --maxlogssize 1GiB --memory 13GiB --public_address 127.0.0.1:4503 --storage_memory 1GiB
foundat+ 11746  0.2  0.1 188440 15556 ?        Sl   20:37   0:00 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4501 --listen_address public --logdir /var/log/foundationdb --logsize 50MiB --machine_id instance-1 --maxlogssize 1GiB --memory 13GiB --public_address 127.0.0.1:4501 --storage_memory 1GiB
foundat+ 11747  0.4  0.1 254272 15976 ?        Sl   20:37   0:00 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4500 --listen_address public --logdir /var/log/foundationdb --logsize 50MiB --machine_id instance-1 --maxlogssize 1GiB --memory 13GiB --public_address 127.0.0.1:4500 --storage_memory 1GiB

Config

public_address = 127.0.0.1:$ID
listen_address = public

Cluster file
localtest:localtest@127.0.0.1:4500

And fdbcli

nan@instance-1:~$ fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> status details

Using cluster file `/etc/foundationdb/fdb.cluster'.

The coordinator(s) have no record of this database. Either the coordinator
addresses are incorrect, the coordination state on those machines is missing, or
no database has been created.

  127.0.0.1:4500  (reachable)

Here is full foundationdb.conf

## foundationdb.conf
##
## Configuration file for FoundationDB server processes
## Full documentation is available at
## https://apple.github.io/foundationdb/configuration.html#the-configuration-file

[fdbmonitor]
user = foundationdb
group = foundationdb

[general]
restart_delay = 60
## by default, restart_backoff = restart_delay_reset_interval = restart_delay
# initial_restart_delay = 0
# restart_backoff = 60
# restart_delay_reset_interval = 60
cluster_file = /etc/foundationdb/fdb.cluster
# delete_envvars =
# kill_on_configuration_change = true

## Default parameters for individual fdbserver processes
[fdbserver]
command = /usr/sbin/fdbserver
public_address = 127.0.0.1:$ID
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb
logsize = 50MiB
maxlogssize = 1GiB
machine_id = instance-1
# datacenter_id =
# class =
memory = 13GiB
storage_memory = 1GiB
# metrics_cluster =
# metrics_prefix =

## An individual fdbserver process with id 4500
## Parameters set here override defaults from the [fdbserver] section
[fdbserver.4500]
[fdbserver.4501]
[fdbserver.4502]
[fdbserver.4503]

[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent
logdir = /var/log/foundationdb

[backup_agent.1]

Does it work if you configure the database, as in this answer by Alec?

Ah yes.

After running
fdbcli --exec "configure new single memory ; status"
it works.

sigh

Shame on me for not actually reading this, which means you were able to have and find a coordinator just fine. Thanks AJ!

1 Like