I’ve been attempting to spin up a 5 server cluster and I’m running into issues. The servers are 12vCPU and 48GB RAM
Here’s the steps I’ve taken:
-
Install fdb via “foundationdb-client_5.2.5-1_amd64.deb” and “foundationdb-server_5.2.5-1_amd64.deb”
-
Update foundationdb.conf to spawn 10 fdbserver processes with 2 on each server as transaction and 3 on each server as stateless (RAM per server process is set to 4Gi):
[fdbserver.4500]
[fdbserver.4501]
class = transaction[fdbserver.4502]
class = transaction[fdbserver.4503]
class = stateless[fdbserver.4504]
class = stateless[fdbserver.4505]
class = stateless[fdbserver.4506]
[fdbserver.4507]
[fdbserver.4508]
[fdbserver.4509]
I’m then able to get all 10 server processes running on each machine and I’m able to:
- set the storage engine as ssd
- configure all the coordinators with IP1:4500 IP2:4500 IP3:4500 IP4:4500 IP5:4500
I then tried to set redunancy to triple and things stopped working on all machines.
The output from fdbcli was:
Coordination state changed
fdb> configure triple
WARNING: Long delay (Ctrl-C to interrupt)
The database is unavailable; type `status' for more information.
WARNING: The cluster file is not up to date. Type 'status' for more information.
^Cfdb> status
WARNING: Long delay (Ctrl-C to interrupt)
# fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
fdb> status
WARNING: Long delay (Ctrl-C to interrupt)
Using cluster file `/etc/foundationdb/fdb.cluster'.
Recruiting new transaction servers.
Need at least 3 log servers, 1 proxies and 1 resolvers.
Have 10 processes on 1 machines.
Timed out trying to retrieve storage servers.
fdb> configure proxies=5
WARNING: Long delay (Ctrl-C to interrupt)
The database is unavailable; type `status' for more information.
^Cfdb> configure logs=8
WARNING: Long delay (Ctrl-C to interrupt)
^Cfdb> status
WARNING: Long delay (Ctrl-C to interrupt)
Using cluster file `/etc/foundationdb/fdb.cluster'.
Recruiting new transaction servers.
Need at least 3 log servers, 1 proxies and 1 resolvers.
Have 10 processes on 1 machines.
Timed out trying to retrieve storage servers.
I’m now getting this status:
"messages" : [
{
"description" : "Unable to start immediate priority transaction after 5 seconds.",
"name" : "immediate_priority_transaction_start_probe_timeout"
},
{
"description" : "Unable to start default priority transaction after 5 seconds.",
"name" : "transaction_start_probe_timeout"
},
{
"description" : "Unable to start batch priority transaction after 5 seconds.",
"name" : "batch_priority_transaction_start_probe_timeout"
},
{
"description" : "Unable to read after 5 seconds.",
"name" : "read_probe_timeout"
},
{
"description" : "Unable to commit after 5 seconds.",
"name" : "commit_probe_timeout"
},
{
"description" : "Timed out trying to retrieve storage servers.",
"name" : "storage_servers_error"
},
{
"description" : "Unable to retrieve all status information.",
"name" : "status_incomplete",
"reasons" : [
{
"description" : "Unable to determine if database is locked after 5 seconds."
},
{
"description" : "Unable to retrieve layer status (Operation aborted because the transaction timed out)."
},
{
"description" : "Unknown performance state."
},
{
"description" : "Unknown read state."
}
]
}
],
"recovery_state" : {
"description" : "Recruiting new transaction servers.",
"name" : "recruiting_transaction_servers",
"required_logs" : 3,
"required_proxies" : 1,
"required_resolvers" : 1
},
What did I do wrong? Should I be setting logs=8 and proxies=5 before trying to configure redundancy? Is there some status I should have waited for after setting coordinators?
Also would be useful to understand if my configuration is optimal. Should I be setting the classes differently? Is more smaller machines better?