I dropped the dependency on the other database to focus on FoundationDB only. What convinced me, is the talk about CouchDB at FoundaionDB summit 2019. IBM via CouchDB is also interested in single machine cluster support and that means that more people will use FoundationDB in that setup.
Spoiler: I am a newbie in big data, database operations, and system administration in general.
So, I put together some code and rented a dedicated server (bare metal). Here is the configuration:
- SSD: RAID 1 with two disks, 300G available space.
- RAM: 64G
- CPU: 40 cores (according to htop)
It cost around $250 per month.
I did not benchmark the SSD setup, next install I will need to do that.
I have installed fail2ban
and setup iptables
with the following program:
#!/bin/sh
# My system IP/set ip address of server
SERVER_IP="116.202.254.220"
# Flushing all rules
iptables -F
iptables -X
# Setting default filter policy
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP
# Allow unlimited traffic on loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
# Allow incoming ssh only
iptables -A INPUT -p tcp -s 0/0 -d $SERVER_IP --sport 513:65535 --dport 22 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s $SERVER_IP -d 0/0 --sport 22 --dport 513:65535 -m state --state ESTABLISHED -j ACCEPT
# Allow incoming http only
iptables -A INPUT -p tcp -s 0/0 -d $SERVER_IP --sport 513:65535 --dport 80 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s $SERVER_IP -d 0/0 --sport 80 --dport 513:65535 -m state --state ESTABLISHED -j ACCEPT
# Allow incoming https only
iptables -A INPUT -p tcp -s 0/0 -d $SERVER_IP --sport 513:65535 --dport 443 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s $SERVER_IP -d 0/0 --sport 443 --dport 513:65535 -m state --state ESTABLISHED -j ACCEPT
# Allow incoming git only
iptables -A INPUT -p tcp -s 0/0 -d $SERVER_IP --sport 513:65535 --dport 9418 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp -s $SERVER_IP -d 0/0 --sport 9418 --dport 513:65535 -m state --state ESTABLISHED -j ACCEPT
# Allow DNS
iptables -A OUTPUT -p udp --dport 53 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A INPUT -p udp --sport 53 -m state --state ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp --dport 53 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A INPUT -p tcp --sport 53 -m state --state ESTABLISHED -j ACCEPT
# make sure nothing comes or goes out of this box
iptables -A INPUT -j DROP
iptables -A OUTPUT -j DROP
There are two problems with that program:
git clone
and othergit
commands do not work,- I need to manually call the above program on every reboot.
I installed the client and server on ubuntu 18.04, then ran the following commands:
# fdbcli
fdb> configure ssd
fdb> configure proxies=5
fdb> configure logs=8
My question is about the last two lines. I have taken that configuration from the very last section of the configuration documentation.
Here is it, with questions:
In a FoundationDB cluster, each of the
fdbserver
processes perform different tasks. Each process is recruited to do a particular task based on its processclass
. For example, processes withclass=storage
are given preference to be recruited for doing storage server tasks,class=transaction
are for log server processes andclass=stateless
are for stateless processes like proxies, resolvers, etc.,
What class
refers to? Is it a pointer into the underlying “roles” taken by each process?
The recommended minimum number of
class=transaction
(log server) processes is 8 (active) + 2 (standby) and the recommended minimum number forclass=stateless
processes is 4 (proxy) + 1 (resolver) + 1 (cluster controller) + 1 (master) + 2 (standby). It is better to spread the transaction and stateless processes across as many machines as possible.
That is where things become complex to me, because I do not see the term class
or transaction
or stateless
in fdbcli configure
:
fdb> configure
Usage: configure [new] <single|double|triple|three_data_hall|three_datacenter|ssd|memory|proxies=<PROXIES>|logs=<LOGS>|resolvers=<RESOLVERS>>*
At the end of the line, there is proxies=<PROXIES>
, logs=<LOGS>
, and resolvers=<RESOLVERS>
.
So far, I have an nginx processus with the configuration given in the documentation of gunicorn:
server {
listen 80;
server_name example.org;
access_log /var/log/nginx/example.log;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
And 10 gunicorn workers, gunicorn documentation says about the number of worker:
This number should generally be between 2-4 workers per core in the server. Check the FAQ for ideas on tuning this parameter.
Running Gunicorn — Gunicorn 21.2.0 documentation
Since I used 10 workers, there is at least 40 - 10/2 = 35 core available.
I am loading with a single CPython process 10G of data, I estimate it will end up using 110G of disk space (because of my layer storage overhead!).
Here is the result of status details:
# fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.
The database is available.
Welcome to the fdbcli. For help, type `help'.
fdb> status details
Using cluster file `/etc/foundationdb/fdb.cluster'.
Configuration:
Redundancy mode - single
Storage engine - ssd-2
Coordinators - 1
Desired Proxies - 5
Desired Logs - 8
Cluster:
FoundationDB processes - 1
Zones - 1
Machines - 1
Memory availability - 61.0 GB per process on machine with least available
Retransmissions rate - 0 Hz
Fault Tolerance - 0 machines
Server time - 06/21/20 09:38:47
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 63.005 GB
Disk space used - 77.530 GB
Operating space:
Storage server - 293.3 GB free on most full server
Log server - 293.3 GB free on most full server
Workload:
Read rate - 596 Hz
Write rate - 7814 Hz
Transactions started - 8 Hz
Transactions committed - 2 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Process performance details:
127.0.0.1:4500 ( 56% cpu; 4% machine; 0.041 Gbps; 55% disk IO; 3.1 GB / 61.0 GB RAM )
Coordination servers:
127.0.0.1:4500 (reachable)
Client time: 06/21/20 09:38:46
fdb>
My question is given I have 35 cores available, what should be the correct fdbcli command to do, taking into consideration that there will be much more reads than writes.
ref: follow up on Greenfield project: What is the best course of action? Is single node cluster ready for production?
ref: follow up on Copernic: versioned triple store