I’m having trouble understanding a few aspects of my cluster status. I’m running 5.2.5. I have read https://apple.github.io/foundationdb/administration.html#monitoring-cluster-status.
Here’s my status:
fdb> status
Using cluster file `fdb.cluster'.
Configuration:
Redundancy mode - single
Storage engine - ssd-2
Coordinators - 5
Cluster:
FoundationDB processes - 35
Machines - 5
Memory availability - 4.6 GB per process on machine with least available
Retransmissions rate - 1 Hz
Fault Tolerance - 0 machines
Server time - 10/03/18 17:52:49
Data:
Replication health - Healthy (Rebalancing)
Moving data - 0.040 GB
Sum of key-value sizes - 47.517 GB
Disk space used - 184.154 GB
Operating space:
Storage server - 227.7 GB free on most full server
Log server - 1.0 GB free on most full server
Workload:
Read rate - 105 Hz
Write rate - 5 Hz
Transactions started - 6 Hz
Transactions committed - 1 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Client time: 10/03/18 17:52:49
I originally created this as triple ssd
. I wanted to try various benchmarks with various storage schemes, so at one point I tried changing it to double memory
and, when that became unhappy, single memory
. That still wasn’t happy so I changed to single ssd
which is where I am now.
Many things about the status output confuse me:
- What is the relationship between “disk space used” and “sum of key-value sizes”? When I was on
triple ssd
the numbers were roughly the same as now (47 GB / 189 GB). I had assumed that “disk space used” would account for the data replication, but it didn’t drop when changing from triple to single. How much memory would I need in order to change fromssd
tomemory
? I had assumed that it would be 47 GB + some overhead, but it seems like maybe I would need 180+ GB of memory? - Why does it say that the operating space for the log server is “1.0 GB free on most full server”? All of the servers have plenty of free disk space (100s of GBs) on the volume where the log directory lives.
- When I refresh status, the “Replication health” tends to cycle between “Healthy” and “Healthy (Rebalancing)” (plus a small nonzero amount of “Moving data”). This is a cluster that is completely idle. What is it doing?