I’m having trouble understanding a few aspects of my cluster status. I’m running 5.2.5. I have read https://apple.github.io/foundationdb/administration.html#monitoring-cluster-status.
Here’s my status:
fdb> status Using cluster file `fdb.cluster'. Configuration: Redundancy mode - single Storage engine - ssd-2 Coordinators - 5 Cluster: FoundationDB processes - 35 Machines - 5 Memory availability - 4.6 GB per process on machine with least available Retransmissions rate - 1 Hz Fault Tolerance - 0 machines Server time - 10/03/18 17:52:49 Data: Replication health - Healthy (Rebalancing) Moving data - 0.040 GB Sum of key-value sizes - 47.517 GB Disk space used - 184.154 GB Operating space: Storage server - 227.7 GB free on most full server Log server - 1.0 GB free on most full server Workload: Read rate - 105 Hz Write rate - 5 Hz Transactions started - 6 Hz Transactions committed - 1 Hz Conflict rate - 0 Hz Backup and DR: Running backups - 0 Running DRs - 0 Client time: 10/03/18 17:52:49
I originally created this as
triple ssd. I wanted to try various benchmarks with various storage schemes, so at one point I tried changing it to
double memory and, when that became unhappy,
single memory. That still wasn’t happy so I changed to
single ssd which is where I am now.
Many things about the status output confuse me:
- What is the relationship between “disk space used” and “sum of key-value sizes”? When I was on
triple ssdthe numbers were roughly the same as now (47 GB / 189 GB). I had assumed that “disk space used” would account for the data replication, but it didn’t drop when changing from triple to single. How much memory would I need in order to change from
memory? I had assumed that it would be 47 GB + some overhead, but it seems like maybe I would need 180+ GB of memory?
- Why does it say that the operating space for the log server is “1.0 GB free on most full server”? All of the servers have plenty of free disk space (100s of GBs) on the volume where the log directory lives.
- When I refresh status, the “Replication health” tends to cycle between “Healthy” and “Healthy (Rebalancing)” (plus a small nonzero amount of “Moving data”). This is a cluster that is completely idle. What is it doing?