How to add a server container to a cluster

Using same docker image of

I’m trying to add a container to a cluster following this instruction;

Start the first server in a container.

$ docker run -ti --rm --name=fdb -h fdb  hiroshi3110/foundationdb:5.1.5-1_ubuntu-16.04 bash
root@fdb:~# cat /etc/foundationdb/fdb.cluster 
root@fdb:~# getent hosts fdb      fdb
root@fdb:~# echo 8YoJRcwD:WZ545WZr@ > /etc/foundationdb/fdb.cluster
root@fdb:~# /usr/lib/foundationdb/fdbmonitor 
Time="1525953600.858815" Severity="10" LogGroup="default" Process="fdbmonitor": Started FoundationDB Process Monitor 5.1 (v5.1.5)
Time="1525953600.865933" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf file /etc/foundationdb/foundationdb.conf
Time="1525953600.866062" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf dir /etc/foundationdb (2)
Time="1525953600.866092" Severity="10" LogGroup="default" Process="fdbmonitor": Loading configuration /etc/foundationdb/foundationdb.conf
Time="1525953600.868531" Severity="10" LogGroup="default" Process="fdbmonitor": Starting backup_agent.1
Time="1525953600.869477" Severity="10" LogGroup="default" Process="fdbmonitor": Starting fdbserver.4500
Time="1525953600.870077" Severity="10" LogGroup="default" Process="backup_agent.1": Launching /usr/lib/foundationdb/backup_agent/backup_agent (16) for backup_agent.1
Time="1525953600.873488" Severity="10" LogGroup="default" Process="fdbserver.4500": Launching /usr/sbin/fdbserver (17) for fdbserver.4500
Time="1525953600.922129" Severity="10" LogGroup="default" Process="fdbserver.4500": FDBD joined cluster.

It seems to work

$ docker exec -ti fdb fdbcli              
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is available.

Welcome to the fdbcli. For help, type `help'.

Then, add another server to the cluster

$ docker run -ti --rm --name=fdb2 -h fdb2 hiroshi3110/foundationdb:5.1.5-1_ubuntu-16.04 bash 
root@fdb2:~# echo 8YoJRcwD:WZ545WZr@ > /etc/foundationdb/fdb.cluster
root@fdb2:~# cat /etc/foundationdb/fdb.cluster 
root@fdb2:~# /usr/lib/foundationdb/fdbmonitor
Time="1525953996.607291" Severity="10" LogGroup="default" Process="fdbmonitor": Started FoundationDB Process Monitor 5.1 (v5.1.5)
Time="1525953996.614169" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf file /etc/foundationdb/foundationdb.conf
Time="1525953996.614269" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf dir /etc/foundationdb (2)
Time="1525953996.614329" Severity="10" LogGroup="default" Process="fdbmonitor": Loading configuration /etc/foundationdb/foundationdb.conf
Time="1525953996.615638" Severity="10" LogGroup="default" Process="fdbmonitor": Starting backup_agent.1
Time="1525953996.616053" Severity="10" LogGroup="default" Process="fdbmonitor": Starting fdbserver.4500
Time="1525953996.618680" Severity="10" LogGroup="default" Process="backup_agent.1": Launching /usr/lib/foundationdb/backup_agent/backup_agent (13) for backup_agent.1
Time="1525953996.620875" Severity="10" LogGroup="default" Process="fdbserver.4500": Launching /usr/sbin/fdbserver (14) for fdbserver.4500
Time="1525953996.655526" Severity="10" LogGroup="default" Process="fdbserver.4500": FDBD joined cluster.

Database is now unavailable…

$ docker exec -ti fdb fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> status details

Using cluster file `/etc/foundationdb/fdb.cluster'.

Unable to locate the master worker.

Unable to retrieve all status information.

  Redundancy mode        - single
  Storage engine         - memory
  Coordinators           - 1

  FoundationDB processes - 1
  Machines               - 1
  Memory availability    - 3.4 GB per process on machine with least available
                           >>>>> (WARNING: 4.0 GB recommended) <<<<<
  Fault Tolerance        - 0 machines
  Server time            - 05/10/18 12:08:58

  Replication health     - unknown
  Moving data            - unknown
  Sum of key-value sizes - unknown
  Disk space used        - unknown

Operating space:
  Unable to retrieve operating space status

  Read rate              - unknown
  Write rate             - unknown
  Transactions started   - unknown
  Transactions committed - unknown
  Conflict rate          - unknown

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:        (  2% cpu;  5% machine; 0.000 Gbps;  0% disk IO; 0.3 GB / 3.4 GB RAM  )

Coordination servers:  (reachable)

Client time: 05/10/18 12:08:55


What’s wrong with those steps?

I had a situation with similar symptoms while setting up clusters from pre-baked images. What worked for me:

  1. Stop foundationdb service on the new node
  2. Wipe its local storage (sudo rm -rf /var/lib/foundationdb/data/4500/)
  3. Restart the service.

To put things into the context, here is the bigger script (part of Terraform config)


Removing the directory of second process before adding it to the cluster works!

My docker image building did just dpkg -i. it start a server and create data directory for a cluster. I guess those separated cluster setup can make some kind of conflict when adding a container to another.

fdb> status details

Using cluster file `/etc/foundationdb/fdb.cluster'.

  Redundancy mode        - single
  Storage engine         - memory
  Coordinators           - 1

  FoundationDB processes - 2
  Machines               - 2
  Memory availability    - 3.4 GB per process on machine with least available
                           >>>>> (WARNING: 4.0 GB recommended) <<<<<
  Fault Tolerance        - 0 machines
  Server time            - 05/10/18 23:54:35

  Replication health     - (Re)initializing automatic data distribution
  Moving data            - unknown (initializing)
  Sum of key-value sizes - unknown
  Disk space used        - 0 MB

Operating space:
  Storage server         - 21.3 GB free on most full server
  Log server             - 21.3 GB free on most full server

  Read rate              - 19 Hz
  Write rate             - 2 Hz
  Transactions started   - 6 Hz
  Transactions committed - 1 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:        (  8% cpu;  7% machine; 0.000 Gbps;  0% disk IO; 0.3 GB / 3.4 GB RAM  )        (  5% cpu;  7% machine; 0.000 Gbps;  0% disk IO; 0.4 GB / 3.5 GB RAM  )

Coordination servers:  (reachable)

Client time: 05/10/18 23:54:35

See my reply to this other post for a possible explanation for this behavior.

1 Like