How to add a server container to a cluster

Using same docker image of https://forums.foundationdb.org/t/simple-dockerfile

I’m trying to add a container to a cluster following this instruction;
https://apple.github.io/foundationdb/administration.html#adding-machines-to-a-cluster

Start the first server in a container.

$ docker run -ti --rm --name=fdb -h fdb  hiroshi3110/foundationdb:5.1.5-1_ubuntu-16.04 bash
root@fdb:~# cat /etc/foundationdb/fdb.cluster 
8YoJRcwD:WZ545WZr@127.0.0.1:4500
root@fdb:~# getent hosts fdb
172.17.0.2      fdb
root@fdb:~# echo 8YoJRcwD:WZ545WZr@172.17.0.2:4500 > /etc/foundationdb/fdb.cluster
root@fdb:~# /usr/lib/foundationdb/fdbmonitor 
Time="1525953600.858815" Severity="10" LogGroup="default" Process="fdbmonitor": Started FoundationDB Process Monitor 5.1 (v5.1.5)
Time="1525953600.865933" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf file /etc/foundationdb/foundationdb.conf
Time="1525953600.866062" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf dir /etc/foundationdb (2)
Time="1525953600.866092" Severity="10" LogGroup="default" Process="fdbmonitor": Loading configuration /etc/foundationdb/foundationdb.conf
Time="1525953600.868531" Severity="10" LogGroup="default" Process="fdbmonitor": Starting backup_agent.1
Time="1525953600.869477" Severity="10" LogGroup="default" Process="fdbmonitor": Starting fdbserver.4500
Time="1525953600.870077" Severity="10" LogGroup="default" Process="backup_agent.1": Launching /usr/lib/foundationdb/backup_agent/backup_agent (16) for backup_agent.1
Time="1525953600.873488" Severity="10" LogGroup="default" Process="fdbserver.4500": Launching /usr/sbin/fdbserver (17) for fdbserver.4500
Time="1525953600.922129" Severity="10" LogGroup="default" Process="fdbserver.4500": FDBD joined cluster.

It seems to work

$ docker exec -ti fdb fdbcli              
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is available.

Welcome to the fdbcli. For help, type `help'.
fdb> 

Then, add another server to the cluster

$ docker run -ti --rm --name=fdb2 -h fdb2 hiroshi3110/foundationdb:5.1.5-1_ubuntu-16.04 bash 
root@fdb2:~# echo 8YoJRcwD:WZ545WZr@172.17.0.2:4500 > /etc/foundationdb/fdb.cluster
root@fdb2:~# cat /etc/foundationdb/fdb.cluster 
8YoJRcwD:WZ545WZr@172.17.0.2:4500
root@fdb2:~# /usr/lib/foundationdb/fdbmonitor
Time="1525953996.607291" Severity="10" LogGroup="default" Process="fdbmonitor": Started FoundationDB Process Monitor 5.1 (v5.1.5)
Time="1525953996.614169" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf file /etc/foundationdb/foundationdb.conf
Time="1525953996.614269" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf dir /etc/foundationdb (2)
Time="1525953996.614329" Severity="10" LogGroup="default" Process="fdbmonitor": Loading configuration /etc/foundationdb/foundationdb.conf
Time="1525953996.615638" Severity="10" LogGroup="default" Process="fdbmonitor": Starting backup_agent.1
Time="1525953996.616053" Severity="10" LogGroup="default" Process="fdbmonitor": Starting fdbserver.4500
Time="1525953996.618680" Severity="10" LogGroup="default" Process="backup_agent.1": Launching /usr/lib/foundationdb/backup_agent/backup_agent (13) for backup_agent.1
Time="1525953996.620875" Severity="10" LogGroup="default" Process="fdbserver.4500": Launching /usr/sbin/fdbserver (14) for fdbserver.4500
Time="1525953996.655526" Severity="10" LogGroup="default" Process="fdbserver.4500": FDBD joined cluster.

Database is now unavailable…

$ docker exec -ti fdb fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> status details

Using cluster file `/etc/foundationdb/fdb.cluster'.

Unable to locate the master worker.

Unable to retrieve all status information.

Configuration:
  Redundancy mode        - single
  Storage engine         - memory
  Coordinators           - 1

Cluster:
  FoundationDB processes - 1
  Machines               - 1
  Memory availability    - 3.4 GB per process on machine with least available
                           >>>>> (WARNING: 4.0 GB recommended) <<<<<
  Fault Tolerance        - 0 machines
  Server time            - 05/10/18 12:08:58

Data:
  Replication health     - unknown
  Moving data            - unknown
  Sum of key-value sizes - unknown
  Disk space used        - unknown

Operating space:
  Unable to retrieve operating space status

Workload:
  Read rate              - unknown
  Write rate             - unknown
  Transactions started   - unknown
  Transactions committed - unknown
  Conflict rate          - unknown

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:
  172.17.0.3:4500        (  2% cpu;  5% machine; 0.000 Gbps;  0% disk IO; 0.3 GB / 3.4 GB RAM  )

Coordination servers:
  172.17.0.2:4500  (reachable)

Client time: 05/10/18 12:08:55

fdb> 

What’s wrong with those steps?

I had a situation with similar symptoms while setting up clusters from pre-baked images. What worked for me:

  1. Stop foundationdb service on the new node
  2. Wipe its local storage (sudo rm -rf /var/lib/foundationdb/data/4500/)
  3. Restart the service.

To put things into the context, here is the bigger script (part of Terraform config)

2 Likes

Thanks!
Removing the directory of second process before adding it to the cluster works!

My docker image building did just dpkg -i. it start a server and create data directory for a cluster. I guess those separated cluster setup can make some kind of conflict when adding a container to another.

fdb> status details

Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - single
  Storage engine         - memory
  Coordinators           - 1

Cluster:
  FoundationDB processes - 2
  Machines               - 2
  Memory availability    - 3.4 GB per process on machine with least available
                           >>>>> (WARNING: 4.0 GB recommended) <<<<<
  Fault Tolerance        - 0 machines
  Server time            - 05/10/18 23:54:35

Data:
  Replication health     - (Re)initializing automatic data distribution
  Moving data            - unknown (initializing)
  Sum of key-value sizes - unknown
  Disk space used        - 0 MB

Operating space:
  Storage server         - 21.3 GB free on most full server
  Log server             - 21.3 GB free on most full server

Workload:
  Read rate              - 19 Hz
  Write rate             - 2 Hz
  Transactions started   - 6 Hz
  Transactions committed - 1 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:
  172.24.0.2:4500        (  8% cpu;  7% machine; 0.000 Gbps;  0% disk IO; 0.3 GB / 3.4 GB RAM  )
  172.24.0.3:4500        (  5% cpu;  7% machine; 0.000 Gbps;  0% disk IO; 0.4 GB / 3.5 GB RAM  )

Coordination servers:
  172.24.0.2:4500  (reachable)

Client time: 05/10/18 23:54:35

See my reply to this other post for a possible explanation for this behavior.

1 Like