Is there a better way to count the number of restarts which fdbmonitor
performs, other than parsing its logged output?
In my company, we are exporting metrics in Prometheus format. status json
is exporting the uptime_seconds
per process. A reset in the counter means that fdbserver has been restarted. This info has been pretty useful for us to detect OOM from misconfigured StorageServers
Thanks, that is a great idea!
I am actually using a fork of your older Go version for the metrics: GitHub - PierreZ/fdb-prometheus-exporter: A FoundationDB Prometheus metrics exporter
aha, I wrote this a looooong time ago, when I was discovering fdb. It is not really maintained anymore, whereas the CleverCloud one is used internally and maintained
Yes, maintained by @AlexandreBrg which has most(if not all) of the commits
FYI, when there is a crash or termination (via SIGKILL) a new metric will be generated. In this case Prometheus will not count it as a reset.
Hello,
According me to count the number of restarts performed by fdbmonitor more efficiently than parsing its logged output, consider using the FoundationDB status API, which can fetch and monitor restart metrics directly, providing real-time data without log parsing. You can also use another option is to integrate fdbmonitor with external monitoring tools like Prometheus or Grafana, which can be configured to track restart events and display them on a dashboard for easier monitoring.
Which status API are you referring to? The JSON status? It does not contain a count of restarts; can you please provide an example of the information you are referring to?