Questions on status json

harikb · August 6, 2021, 11:30pm

Hi,

While reading at the documentation of Machine-Readable Status — FoundationDB 6.3 , I ran into a few things I couldn’t understand.

For clusters.processes.[instance].disk, there is free_bytes as well as total_bytes (both are given as quantities of RAM, while the key name is ‘disk’). I assume it is indeed RAM information and “disk” label is for symmetry/historic
I noticed, with the comments, it isn’t really a valid json. Is there a way for me to submit corrections? pull request on the github repo? For example, a comma is needed after value for key “database_lock_state”.
My db (“6.3.15”) returns a “logs” key under “clusters”. The document above has only “old_logs”. Is this the way keys/metrics are deprecated? Any way to get a version of this documentation that matches the released FDB?

Among OSS projects with prometheus support, this one

seems most comprehensive, although changes in last 2 years would be missing.
I also appreciate any other pointers to libraries that deal with status json output.

I do understand that status json doesn’t have any preset schema and all keys can change anytime.

Thanks

Hari

PierreZ · August 10, 2021, 3:29pm

Hi @harikb

I see you stambled across my Prometheus exporter I wrote this in order to quickly export some metrics for some homemade chaos-monkey experiment. I’m not using it, and I’m pretty sure it is not safe (safe as I’m sure the model is correct and it can deserialize properly across multiple fdb versions).

I will install a fdb cluster in the short future and I may rewrite it to make it more useful. In the meantime, you can have a look at this discussion where you can see how others are monitoring and using the status json.

I guess you can still build the relevant documentation for a tag/branch by running ninja docpreview.

ajbeamon · August 10, 2021, 4:08pm

Free and total bytes are both of the disk, and the comment in our documentation mentioning RAM is incorrect. In addition to these, in the storage and log roles there are some metrics that include space details. These include *_available_bytes (e.g. kvstore_available_bytes), which measures how much space is usable on disk and is in most cases a more interesting number than free_bytes.

Yes, if you would like to submit corrections to the website documentation, feel free to add a pull request. You are correct that the website docs are not valid JSON because of the comments, but besides that I think the intent is that they should be valid. There is also a second file which is meant to be kept in sync with the website docs and which I think should be valid (though I haven’t confirmed this). The two files are:

github.com

apple/foundationdb/blob/main/documentation/sphinx/source/mr-status-json-schemas.rst.inc

.. -*- mode: rst; -*-

.. code-block:: javascript

   "cluster":{
        "storage_wiggler": {
            "wiggle_server_ids":["0ccb4e0feddb55"],
            "wiggle_server_addresses": ["127.0.0.1"],
            "primary": { // primary DC storage wiggler stats
                // One StorageServer wiggle round is considered 'complete', when all StorageServers with creationTime < T are wiggled
               "last_round_start_datetime": "2022-04-02 00:05:05.123 +0000",
               "last_round_start_timestamp": 1648857905.123, // when did the latest round start
               "last_round_finish_datetime": "1970-01-01 00:00:00.000 +0000",
               "last_round_finish_timestamp": 0, // when did the latest finished round finish
               "smoothed_round_seconds": 1, // moving average duration of a wiggle round
               "finished_round": 1,
               // 1 wiggle step as 1 storage server is wiggled in the current round
               "last_wiggle_start_datetime": "2022-04-02 00:05:05.123 +0000",
               "last_wiggle_start_timestamp": 1648857905.123, // when did the latest wiggle step start
               "last_wiggle_finish_datetime": "1970-01-01 00:00:00.000 +0000",

This file has been truncated. show original

and

github.com

apple/foundationdb/blob/main/fdbclient/Schemas.cpp

/*
 * Schemas.cpp
 *
 * This source file is part of the FoundationDB open source project
 *
 * Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

This file has been truncated. show original

old_logs here refers to the old generation of transaction logs after a recovery occurs and we recruit new ones and is not meant to imply deprecation. It seems that 6.3 changed what we report here and renamed this field to logs, but that is not reflected in the documentation. I’ll see if I can have that updated.

harikb · August 18, 2021, 8:21pm

Thank you @ajbeamon and @PierreZ
This makes sense.
@PierreZ I will be happy to work with you and contribute to your repo once I do some cleanup.

I also see other structures in Schema.cpp like clusterConfigurationSchema, latencyBandConfigurationSchema
Are these externally exposed information?

ajbeamon · August 19, 2021, 6:04pm

The latency band configuration is documented here:

https://apple.github.io/foundationdb/administration.html#server-side-latency-band-tracking

The cluster configuration schema is used for the fileconfigure command in fdbcli, and looks roughly like the configuration section documented in machine readable status.

ajbeamon · August 19, 2021, 6:18pm

This and a variety of other missing fields are updated in this PR: Update the status documentation to include missing fields by sfc-gh-abeamon · Pull Request #5420 · apple/foundationdb · GitHub

PierreZ · August 21, 2021, 10:40am

We(Clever Cloud) are investing in FDB, but we haven’t deployed a cluster yet. We should start this somewhere around September/October. We have some PoC of tools to gain observability over a FDB cluster, including:

a new Prometheus exporter. This will be fairly similar to the one written in go, but it will be:
- not tied to a fdb version,
- capable of generating metrics directly by parsing the json dynamically while recognize some patterns like machines/processes key as tags, and enum/messages handling
- written in rust.
a tracing exporter. This is based on the 7.0 release which has some Tracing features thanks to @markus.pilman’s work. Parsing and sending to Jaeger is done, I just need to figure out how to handle the Multiple parent spans in Jaeger.

Both of them will be open-sourced, once we have them battle-tested on our side.

Topic		Replies	Views
Few queries on status json Using FoundationDB	5	1227	December 7, 2018
Monitor FDB in production Using FoundationDB	6	1086	June 1, 2022
How to expose a metric for the fdbmonitor count of restarts? Using FoundationDB	8	226	July 5, 2024
A brand New FoundationDB Prometheus Metrics Exporter Community	1	520	June 28, 2023
FoundationDB Status as JSON Development	0	785	May 8, 2018

Questions on status json

Thanks

Related topics