How to remove process from test server

peter · April 19, 2021, 10:05am

I am running a local foundationdb instance for experimenting with the API.

I changed the number of processes on my local server, by editing /etc/foundationdb/foundationdb.conf to use 4 processes:

[fdbserver.4500]
[fdbserver.4501]
[fdbserver.4502]
[fdbserver.4503]

This worked and foundationdb used 4 CPU cores.
When I wanted to reduce it to 3 CPUs, I removed the last line from the config and restarted the server.

Now I get the following error:

The database is unavailable; type `status’ for more information.

Welcome to the fdbcli. For help, type `help’.
fdb> status

Using cluster file `/etc/foundationdb/fdb.cluster’.

Locking coordination state. Verify that a majority of coordination server
processes are active.

127.0.0.1:4500 (reachable)

Unable to locate the data distributor worker.

Unable to locate the ratekeeper worker.

How do I correctly reduce the number of processes on my test server?

gaurav · April 19, 2021, 1:05pm

Did you have only a single coordinator in this setup? What does your cluster (fdb.cluster) config file look like?

peter · April 19, 2021, 1:25pm

Yes, this is a single machine / single coordinator setup on my local machine. I just want to get familiar with the API at the moment.

Here is my cluster file:

U6xiE0jK:Py4rIIul@127.0.0.1:4500

And here is my config file (no changes from default, except for number of processes):

## foundationdb.conf
##
## Configuration file for FoundationDB server processes
## Full documentation is available at
## https://apple.github.io/foundationdb/configuration.html#the-configuration-file

[fdbmonitor]
user = foundationdb
group = foundationdb

[general]
restart_delay = 60
## by default, restart_backoff = restart_delay_reset_interval = restart_delay
# initial_restart_delay = 0
# restart_backoff = 60
# restart_delay_reset_interval = 60
cluster_file = /etc/foundationdb/fdb.cluster
# delete_envvars =
# kill_on_configuration_change = true

## Default parameters for individual fdbserver processes
[fdbserver]
command = /usr/sbin/fdbserver
public_address = auto:$ID
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb
# logsize = 10MiB
# maxlogssize = 100MiB
# machine_id = 
# datacenter_id = 
# class = 
# memory = 8GiB
# storage_memory = 1GiB
# cache_memory = 2GiB
# metrics_cluster = 
# metrics_prefix =

## An individual fdbserver process with id 4500
## Parameters set here override defaults from the [fdbserver] section
[fdbserver.4500]
[fdbserver.4501]
[fdbserver.4502]
[fdbserver.4503]

[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent
logdir = /var/log/foundationdb

[backup_agent.1]

I installed the server from foundationdb-server_6.3.12-1_amd64.deb following the steps at Getting Started on Linux — FoundationDB 6.3 .

gaurav · April 19, 2021, 2:01pm

I think the issue could be that this db has replication factor of 1. So it cannot accommodate even one machine “failure”.

You should bring back that line to make the cluster healthy. Then:

for controlled machine exclusions, you can use the exclude command from fdbcli. This will first copy the data away from the machine, before excluding it from cluster. Once it is done, you can remove the process line for it from the config.
to accommodate uncontrolled failures, like the one you simulated by removing the process line, make the replication factor to be 2+ from fdbcli, and then try the file based removal again. This time the cluster should still do okay.

Be careful that your coordinator process is not excluded. You can increase coordinator process using coordinator fdbcli command to increase resiliency of coordinator process exclusion/failures.

Also see this.

ajbeamon · April 19, 2021, 4:49pm

Guarav’s recommendations are good, and to avoid a potential pitfall here I’ll point out one other detail that may impact you if you try to use a single machine for double replication. If you try to do this with the configuration you had above, I believe it will fail because it requires multiple machines. To make this work, you would need to simulate having different machines by configuring the machine IDs on your processes. For example:

[fdbserver.4500]
machine_id = 0

[fdbserver.4501]
machine_id = 1

[fdbserver.4502]
machine_id = 2

[fdbserver.4503]
machine_id = 3

peter · April 19, 2021, 9:59pm

Thanks for the answers.

I actually changed the number of processes to see how fdb scales when increasing the number of processes.
So changing the configuration back to the original was not to stimulate failures, I just wanted to go back to defaults without reinstalling fdb.

With gauravs answer, I was able to remove one of the four processes, but for getting back to the initial config I had to reinstall the server.

Topic		Replies	Views
'Locking coordination' state after process removal Using FoundationDB	7	2124	July 11, 2019
Can I remove a process? Using FoundationDB	8	2178	June 18, 2019
FoundationDB processes - 2 (less 0 excluded; 1 with errors) Using FoundationDB performance	7	895	March 13, 2020
How to blow away a DB and recreate it? Using FoundationDB	3	842	October 10, 2018
Locking coordination state. Verify that a majority of coordinattion server process are active. Single machine Using FoundationDB	4	1183	March 8, 2021

How to remove process from test server

Related topics