Getting started with FoundationDB: Single node cluster deployment

I dropped the dependency on the other database to focus on FoundationDB only. What convinced me, is the talk about CouchDB at FoundaionDB summit 2019. IBM via CouchDB is also interested in single machine cluster support and that means that more people will use FoundationDB in that setup.

Spoiler: I am a newbie in big data, database operations, and system administration in general.

So, I put together some code and rented a dedicated server (bare metal). Here is the configuration:

  • SSD: RAID 1 with two disks, 300G available space.
  • RAM: 64G
  • CPU: 40 cores (according to htop)

It cost around $250 per month.

I did not benchmark the SSD setup, next install I will need to do that.

I have installed fail2ban and setup iptables with the following program:

#!/bin/sh                                                                                                                                                                                                          
# My system IP/set ip address of server                                                                                                                                                                            
SERVER_IP="116.202.254.220"
# Flushing all rules                                                                                                                                                                                               
iptables -F
iptables -X
# Setting default filter policy                                                                                                                                                                                    
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP
# Allow unlimited traffic on loopback                                                                                                                                                                              
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT

# Allow incoming ssh only                                                                                                                                                                                          
iptables -A INPUT -p tcp -s 0/0 -d $SERVER_IP --sport 513:65535 --dport 22 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s $SERVER_IP -d 0/0 --sport 22 --dport 513:65535 -m state --state ESTABLISHED -j ACCEPT

# Allow incoming http only                                                                                                                                                                                         
iptables -A INPUT -p tcp -s 0/0 -d $SERVER_IP --sport 513:65535 --dport 80 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s $SERVER_IP -d 0/0 --sport 80 --dport 513:65535 -m state --state ESTABLISHED -j ACCEPT

# Allow incoming https only                                                                                                                                                                                        
iptables -A INPUT -p tcp -s 0/0 -d $SERVER_IP --sport 513:65535 --dport 443 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s $SERVER_IP -d 0/0 --sport 443 --dport 513:65535 -m state --state ESTABLISHED -j ACCEPT

# Allow incoming git only                                                                                                                                                                                          
iptables -A INPUT -p tcp -s 0/0 -d $SERVER_IP --sport 513:65535 --dport 9418 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s $SERVER_IP -d 0/0 --sport 9418 --dport 513:65535 -m state --state ESTABLISHED -j ACCEPT

# Allow DNS                                                                                                                                                                                                        
iptables -A OUTPUT -p udp --dport 53 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A INPUT  -p udp --sport 53 -m state --state ESTABLISHED     -j ACCEPT
iptables -A OUTPUT -p tcp --dport 53 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A INPUT  -p tcp --sport 53 -m state --state ESTABLISHED     -j ACCEPT

# make sure nothing comes or goes out of this box                                                                                                                                                                  
iptables -A INPUT -j DROP
iptables -A OUTPUT -j DROP

There are two problems with that program:

  • git clone and other git commands do not work,
  • I need to manually call the above program on every reboot.

I installed the client and server on ubuntu 18.04, then ran the following commands:

# fdbcli
fdb> configure ssd
fdb> configure proxies=5
fdb> configure logs=8

My question is about the last two lines. I have taken that configuration from the very last section of the configuration documentation.

Here is it, with questions:

In a FoundationDB cluster, each of the fdbserver processes perform different tasks. Each process is recruited to do a particular task based on its process class . For example, processes with class=storage are given preference to be recruited for doing storage server tasks, class=transaction are for log server processes and class=stateless are for stateless processes like proxies, resolvers, etc.,

What class refers to? Is it a pointer into the underlying “roles” taken by each process?

The recommended minimum number of class=transaction (log server) processes is 8 (active) + 2 (standby) and the recommended minimum number for class=stateless processes is 4 (proxy) + 1 (resolver) + 1 (cluster controller) + 1 (master) + 2 (standby). It is better to spread the transaction and stateless processes across as many machines as possible.

That is where things become complex to me, because I do not see the term class or transaction or stateless in fdbcli configure:

fdb> configure
Usage: configure [new] <single|double|triple|three_data_hall|three_datacenter|ssd|memory|proxies=<PROXIES>|logs=<LOGS>|resolvers=<RESOLVERS>>*

At the end of the line, there is proxies=<PROXIES>, logs=<LOGS>, and resolvers=<RESOLVERS>.

So far, I have an nginx processus with the configuration given in the documentation of gunicorn:

  server {
    listen 80;
    server_name example.org;
    access_log  /var/log/nginx/example.log;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
  }

And 10 gunicorn workers, gunicorn documentation says about the number of worker:

This number should generally be between 2-4 workers per core in the server. Check the FAQ for ideas on tuning this parameter.
https://docs.gunicorn.org/en/latest/run.html#commonly-used-arguments

Since I used 10 workers, there is at least 40 - 10/2 = 35 core available.

I am loading with a single CPython process 10G of data, I estimate it will end up using 110G of disk space (because of my layer storage overhead!).

Here is the result of status details:

# fdbcli 
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is available.

Welcome to the fdbcli. For help, type `help'.
fdb> status details

Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - single
  Storage engine         - ssd-2
  Coordinators           - 1
  Desired Proxies        - 5
  Desired Logs           - 8

Cluster:
  FoundationDB processes - 1
  Zones                  - 1
  Machines               - 1
  Memory availability    - 61.0 GB per process on machine with least available
  Retransmissions rate   - 0 Hz
  Fault Tolerance        - 0 machines
  Server time            - 06/21/20 09:38:47

Data:
  Replication health     - Healthy
  Moving data            - 0.000 GB
  Sum of key-value sizes - 63.005 GB
  Disk space used        - 77.530 GB

Operating space:
  Storage server         - 293.3 GB free on most full server
  Log server             - 293.3 GB free on most full server

Workload:
  Read rate              - 596 Hz
  Write rate             - 7814 Hz
  Transactions started   - 8 Hz
  Transactions committed - 2 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:
  127.0.0.1:4500         ( 56% cpu;  4% machine; 0.041 Gbps; 55% disk IO; 3.1 GB / 61.0 GB RAM  )

Coordination servers:
  127.0.0.1:4500  (reachable)

Client time: 06/21/20 09:38:46

fdb> 

My question is given I have 35 cores available, what should be the correct fdbcli command to do, taking into consideration that there will be much more reads than writes.

ref: follow up on Greenfield project: What is the best course of action? Is single node cluster ready for production?
ref: follow up on Copernic: versioned triple store