Hi,
Finally my cluster is up and running so I was able to do some significant testing. At this point, we’re using it as a cache, with a very simple transaction logic
_, err = db.Transact(func(tr fdb.Transaction) (ret interface{}, err error) {
future := tr.Get(sub.Pack(tuple.Tuple{"cache", ID}))
res := future.MustGet()
if len(res) == 0 {
tr.Set(sub.Pack(tuple.Tuple{"cache", ID}), []byte(strconv.FormatInt(time.Now().UnixNano(), 10)))
processChan <- id
}
return
})
I tested three different deployments, all on GCP (google cloud)
- 3 servers - each 4 vcpu, 16GB ram, 1TB ssd
- 3 servers - each 32 vcpu, 120GB ram, 1TB ssd
- 11 servers - each 4 vcpu, 16GB ram, 375GB NVME SSD
I’m running FDB with double redundancy and ssd
mode. Each server has equivalent of processes to vcpus (4 has 4, 32 has 32 …). Each servers also has one coordinator, so 3 servers will have 3 coordinators).
There are 80 servers running that run the above code and do some other stuff, with each at 60 gourutines (each goroutines creates a single connection to FDB).
With the 1) setting, I started running into issues that the disk IO was too high and the latency was way up.
Then I switched to 2), especially because GCP has different SSD performance based on how many cores the server has. The max you can get out of the setup is around 32 cores and 1TB, especially for writes.
This setup immediately showed improvements, while the HDD throughput (for writes) was running as high as 300MB/s on those three servers.
And finally I switched to 3) where I got way more servers all running NVME disks. This setup is running in testing “production” for last 3 days so I have the most data for it. One of the confusing things I’m seeing is that the server is reporting ~90% disk IO but nor IOPS nor the throughput is at any significant number and I can’t get it any higher. I though that NVME will be showing the best results, but it’s far from it.
Here is a dump from status details
fdb> status details
Using cluster file `fdb.cluster'.
Could not communicate with all of the coordination servers.
The database will remain operational as long as we
can connect to a quorum of servers, however the fault
tolerance of the system is reduced as long as the
servers remain disconnected.
10.240.0.252:4500 (reachable)
10.240.1.34:4500 (unreachable)
10.240.1.155:4500 (reachable)
10.240.1.171:4500 (reachable)
10.240.2.103:4500 (reachable)
10.240.3.144:4500 (reachable)
10.240.3.192:4500 (reachable)
10.240.4.228:4500 (unreachable)
10.240.5.32:4500 (reachable)
10.240.5.59:4500 (reachable)
10.240.10.139:4500 (reachable)
Configuration:
Redundancy mode - double
Storage engine - ssd-2
Coordinators - 11
Cluster:
FoundationDB processes - 36
Machines - 9
Memory availability - 3.8 GB per process on machine with least available
>>>>> (WARNING: 4.0 GB recommended) <<<<<
Retransmissions rate - 2 Hz
Fault Tolerance - 1 machine
Server time - 07/30/18 14:10:22
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 114.105 GB
Disk space used - 298.159 GB
Operating space:
Storage server - 321.6 GB free on most full server
Log server - 321.6 GB free on most full server
Workload:
Read rate - 90375 Hz
Write rate - 7996 Hz
Transactions started - 90130 Hz
Transactions committed - 2664 Hz
Conflict rate - 200 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Process performance details:
10.240.0.252:4500 ( 29% cpu; 31% machine; 0.018 Gbps; 50% disk IO; 2.9 GB / 3.8 GB RAM )
10.240.0.252:4501 ( 26% cpu; 31% machine; 0.018 Gbps; 49% disk IO; 2.9 GB / 3.8 GB RAM )
10.240.0.252:4502 ( 26% cpu; 31% machine; 0.018 Gbps; 49% disk IO; 2.8 GB / 3.8 GB RAM )
10.240.0.252:4503 ( 30% cpu; 31% machine; 0.018 Gbps; 49% disk IO; 2.8 GB / 3.8 GB RAM )
10.240.1.155:4500 ( 64% cpu; 48% machine; 0.049 Gbps; 46% disk IO; 3.1 GB / 3.9 GB RAM )
10.240.1.155:4501 ( 64% cpu; 48% machine; 0.049 Gbps; 48% disk IO; 2.8 GB / 3.9 GB RAM )
10.240.1.155:4502 ( 27% cpu; 48% machine; 0.049 Gbps; 48% disk IO; 2.7 GB / 3.9 GB RAM )
10.240.1.155:4503 ( 27% cpu; 48% machine; 0.049 Gbps; 49% disk IO; 3.0 GB / 3.9 GB RAM )
10.240.1.171:4500 ( 31% cpu; 30% machine; 0.017 Gbps; 46% disk IO; 3.0 GB / 3.8 GB RAM )
10.240.1.171:4501 ( 28% cpu; 30% machine; 0.017 Gbps; 47% disk IO; 2.7 GB / 3.8 GB RAM )
10.240.1.171:4502 ( 30% cpu; 30% machine; 0.017 Gbps; 47% disk IO; 2.9 GB / 3.8 GB RAM )
10.240.1.171:4503 ( 29% cpu; 30% machine; 0.017 Gbps; 47% disk IO; 2.7 GB / 3.8 GB RAM )
10.240.2.103:4500 ( 32% cpu; 36% machine; 0.030 Gbps; 48% disk IO; 3.1 GB / 3.9 GB RAM )
10.240.2.103:4501 ( 29% cpu; 36% machine; 0.030 Gbps; 48% disk IO; 2.8 GB / 3.9 GB RAM )
10.240.2.103:4502 ( 36% cpu; 36% machine; 0.030 Gbps; 49% disk IO; 2.9 GB / 3.9 GB RAM )
10.240.2.103:4503 ( 37% cpu; 36% machine; 0.030 Gbps; 49% disk IO; 2.8 GB / 3.9 GB RAM )
10.240.3.144:4500 ( 31% cpu; 32% machine; 0.017 Gbps; 47% disk IO; 2.8 GB / 3.9 GB RAM )
10.240.3.144:4501 ( 30% cpu; 32% machine; 0.017 Gbps; 46% disk IO; 2.9 GB / 3.9 GB RAM )
10.240.3.144:4502 ( 31% cpu; 32% machine; 0.017 Gbps; 47% disk IO; 2.9 GB / 3.9 GB RAM )
10.240.3.144:4503 ( 29% cpu; 32% machine; 0.017 Gbps; 46% disk IO; 2.9 GB / 3.9 GB RAM )
10.240.3.192:4500 ( 35% cpu; 53% machine; 0.042 Gbps; 84% disk IO; 3.0 GB / 3.9 GB RAM )
10.240.3.192:4501 ( 68% cpu; 53% machine; 0.042 Gbps; 84% disk IO; 3.0 GB / 3.9 GB RAM )
10.240.3.192:4502 ( 33% cpu; 53% machine; 0.042 Gbps; 84% disk IO; 2.9 GB / 3.9 GB RAM )
10.240.3.192:4503 ( 74% cpu; 53% machine; 0.042 Gbps; 84% disk IO; 2.8 GB / 3.9 GB RAM )
10.240.5.32:4500 ( 65% cpu; 43% machine; 0.034 Gbps; 88% disk IO; 3.0 GB / 3.9 GB RAM )
10.240.5.32:4501 ( 36% cpu; 43% machine; 0.034 Gbps; 89% disk IO; 2.9 GB / 3.9 GB RAM )
10.240.5.32:4502 ( 33% cpu; 43% machine; 0.034 Gbps; 86% disk IO; 3.0 GB / 3.9 GB RAM )
10.240.5.32:4503 ( 30% cpu; 43% machine; 0.034 Gbps; 84% disk IO; 2.9 GB / 3.9 GB RAM )
10.240.5.59:4500 ( 31% cpu; 29% machine; 0.018 Gbps; 47% disk IO; 3.0 GB / 3.8 GB RAM )
10.240.5.59:4501 ( 27% cpu; 29% machine; 0.018 Gbps; 47% disk IO; 2.9 GB / 3.8 GB RAM )
10.240.5.59:4502 ( 29% cpu; 29% machine; 0.018 Gbps; 47% disk IO; 2.9 GB / 3.8 GB RAM )
10.240.5.59:4503 ( 28% cpu; 29% machine; 0.018 Gbps; 47% disk IO; 2.7 GB / 3.8 GB RAM )
10.240.10.139:4500 ( 26% cpu; 36% machine; 0.035 Gbps; 87% disk IO; 3.1 GB / 3.9 GB RAM )
10.240.10.139:4501 ( 23% cpu; 36% machine; 0.035 Gbps; 83% disk IO; 2.8 GB / 3.9 GB RAM )
10.240.10.139:4502 ( 29% cpu; 36% machine; 0.035 Gbps; 86% disk IO; 2.9 GB / 3.9 GB RAM )
10.240.10.139:4503 ( 58% cpu; 36% machine; 0.035 Gbps; 87% disk IO; 3.1 GB / 3.9 GB RAM )
Client time: 07/30/18 14:10:20
I would appreciate any pointers regarding the best setup that allows the highest utilization of the resources.
Some of my questions
- is it better to have many smaller servers or a few large ones?
- what is the most optimal configuration for a single server?
- I’m still pretty confused by the process class. Do I set them up manually? Do I let the client/server to decide which ones to dedicate or should I choose them manually?
- It seems like NVME disks may not be optimal for FDB. Is it better to use just SSD?
Anything else that you would suggest?
BTW I must say the resiliency of the system if mind-blowing. I took down one server to expand it and everything worked incredibly well. Each time I brought the server back the whole cluster synced within couple of minutes and I could move on on the next one.