Now we are using foundationdb as a database for our application. We need to monitor the performance behavior of foundationdb. I wonder if there is any performance metrics of foundationdb to monitor? And is there any tools/command/API/GUI to check and monitor these metrics? Thanks!
We don’t have a good documentation on this topic. Most metrics are captured in log files from fdbserver processes, e.g., ProcessMetrics, MachineMetrics, NetworkMetrics, ProxyMetrics, TLogMetrics, StorageMetrics. Each of the metric above has many fields to expose information. There are also many histograms in the log files. Typically, you want to import these log files into a timeseries database, and then query from them, e.g., using splunk.
Another way is to use the status json from fdbcli, which summarizes many information from the database. Because it’s costly for the server to generating the json, this probably should be run once every 5s or so.
The information to monitor typically include: CPU by roles, memory, disk usage/space and dataset size, network connection/retransmit, client and server side latency (commit, GRV, read), throughput, availability (by probes), backup rate/status, TLog queue, storage queue, storage cache hit ratio, ratekeeper throttling reason and rate, data distribution moving rate, reachable coordinators, etc.