Few queries on status json

gaurav · September 8, 2018, 1:43pm

Hi,

I am trying to understand fields in status json for monitoring purposes. There are a few fields that I could not understand, and it would be great if someone can help out with these:

cluster->machines->id->memory->committed_bytes => What does this field convey? And what operating system counter/metric is it using to get its value?
cluster->processes->id->memory->used_bytes => What does this field convey? And what operating system counter/metric is it using to get its value (RES memory or something else)?
cluster->workload->keys->read => What is this field for and how is it different from cluster->workload->operations->read?
cluster->workload->bytes->read/write => what are these fields for? Are they indicating data exchange rate (read and written) at the interface of fdbserver, or does it indicate data written out by fdbservers to disk? How is the field useful for monitoring?

Is there any field that tells me per process or per machine disk read and write rate in bytes? I

ajbeamon · September 10, 2018, 5:44pm

This is the memory on the machine that is not regarded as available. On Linux it comes from /proc/meminfo, and you can see its computation here:

github.com

apple/foundationdb/blob/2f2aaf5bce971d08a4a053a98a28321e887ea0fc/flow/Platform.cpp#L399


      
          	int64_t usedSwap = request[LiteralStringRef("SwapTotal:")] - request[LiteralStringRef("SwapFree:")];
          
          	memInfo.total = 1024 * request[LiteralStringRef("MemTotal:")];
          	if(request[LiteralStringRef("MemAvailable:")] != -1) {
          		memInfo.available = 1024 * (request[LiteralStringRef("MemAvailable:")] - usedSwap);
          	}
          	else {
          		memInfo.available = 1024 * (std::max<int64_t>(0, (memFree-lowWatermark) + std::max(pageCache-lowWatermark, pageCache/2) + std::max(slabReclaimable-lowWatermark, slabReclaimable/2)) - usedSwap);
          	}
          
          	memInfo.committed = memInfo.total - memInfo.available;
          #elif defined(_WIN32)
          	MEMORYSTATUSEX mem_status;
          	mem_status.dwLength = sizeof(mem_status);
          	if (!GlobalMemoryStatusEx(&mem_status)) {
          		TraceEvent(SevError, "WindowsGetMemStatus").GetLastError();
          		throw platform_error();
          	}
          
          	PERFORMACE_INFORMATION perf;
          	if (!GetPerformanceInfo(&perf, sizeof(perf))) {

This is the virtual memory size of the process, and it comes from /proc/self/statm on Linux. You can see it being collected here:

github.com

apple/foundationdb/blob/2f2aaf5bce971d08a4a053a98a28321e887ea0fc/flow/Platform.cpp#L275


      
          		TraceEvent(SevError, "GetResidentMemoryUsage").GetLastError();
          		throw platform_error();
          	}
          	return info.resident_size;
          #else
          	#warning getMemoryUsage unimplemented on this platform
          	return 0;
          #endif
          }
          
          uint64_t getMemoryUsage() {
          #if defined(__linux__)
          	uint64_t vmsize = 0;
          
          	std::ifstream stat_stream("/proc/self/statm", std::ifstream::in);
          
          	if(!stat_stream.good()) {
          		TraceEvent(SevError, "GetMemoryUsage").GetLastError();
          		throw platform_error();
          	}

This is the number of keys read, expressed as a triple of rate, roughness, and total. The rate is measured per second and the total is the sum over all storage processes for their current lifetimes. Roughness is a measure of how evenly spaced or bursty the metric is, with a roughness of 1 corresponding to evenly spaced events and a higher roughness representing events that come in spurts.

The operations metric measures the number of operations, ignoring how many keys each returns. For example, a get range that returns 10 keys counts as 10 in the first metric and 1 in the second.

For reads, this is the same as above except that instead of reporting the number of keys returned, it reports the size of the data returned.

For writes, this is reporting the logical size of the mutations being written to the database (i.e. key+value size, ignoring replication, disk overhead, etc.).

These fields give you an idea how much work the clients of a database are doing in aggregate.

cluster->processes->id->disk->reads/writes->sectors will tell you the number of sectors being read or written, which are typically 512 bytes in size (though you may want to confirm that’s the case in your environment).

gaurav · September 12, 2018, 4:24pm

Thank you for the detailed explanation. This is very helpful.

gaurav · December 7, 2018, 12:47pm

Hi, while writing the monitoring scripts, I could not understand precise semantics of process.disk counters:

disk" : {
    "busy" : 0.047199600000000001,
    "free_bytes" : 827432890368,
    "reads" : {
        "counter" : 8447124,
        "hz" : 17.1999,
        "sectors" : 688
    },
    "total_bytes" : 914706128896,
    "writes" : {
        "counter" : 188700975,
        "hz" : 183.19900000000001,
        "sectors" : 48352
    }
}

Specifically:

busy: what does this indicate? I tried making the disk perform a lot of IO using dd commands in parallel; iostat command showed %util to be close to 98%, whereas the busy counter remained in low single digit %.
read/writes.hz : do these inficate number of read/write iops done by the specific fdbserver process (and not the total iops being issued against the disk - including those done by any other non-fdb process)? If so how are they related (or different) to sectors counter?

And a few more questions on workload counters:

"workload" : {
"bytes" : {
    "read" : {
        "counter" : 2613703043638,
        "hz" : 23296.599999999999,
        "roughness" : 9629.0699999999997
    },
    "written" : {
        "counter" : 3954328153,
        "hz" : 968.18499999999995,
        "roughness" : 901.48900000000003
    }
},
"keys" : {
    "read" : {
        "counter" : 41657599527,
        "hz" : 186.39699999999999,
        "roughness" : 114.771
    }
},
"operations" : {
    "read_requests" : {
        "counter" : 1110279557,
        "hz" : 652.99000000000001,
        "roughness" : 4.3613800000000005
    },
    "reads" : {
        "counter" : 1110279557,
        "hz" : 652.99000000000001,
        "roughness" : 4.3613800000000005
    },
    "writes" : {
        "counter" : 31491107,
        "hz" : 4.5999300000000005,
        "roughness" : 4.2830500000000002
    }
},

There are three seemingly similar counters:

keys.read
operations.read_requests
operations.reads

@ajbeamon explained above that keys->read is the number of keys read by all storage servers, whereas operations->read is the number of read operations (including range reads) issued by clients.
What does operations.read_requests indicate (assuming my above understanding is correct for the other two counters)?

ajbeamon · December 7, 2018, 4:33pm

This gives a fraction of time the disk associated with a process’s data directory is doing IOs over what is typically a 5 second window. On Linux, it comes from reading /proc/diskstats. I don’t have a good explanation for why you saw what you did unless you happened to be doing your IOs on a different drive.

github.com

apple/foundationdb/blob/62d437810998880da8c88e0acd0c30f98ae026de/flow/Platform.cpp#L1216


      
          	uint64_t nowWrites = (*statState)->lastWrites;
          	uint64_t nowWriteSectors = (*statState)->lastWriteSectors; 
          	uint64_t nowReadSectors = (*statState)->lastReadSectors;
          
          	if(dataFolder != "") {
          		getDiskStatistics(dataFolder, currentIOs, nowBusyTicks, nowReads, nowWrites, nowWriteSectors, nowReadSectors);
          		returnStats.processDiskQueueDepth = currentIOs;
          		returnStats.processDiskReadCount = nowReads;
          		returnStats.processDiskWriteCount = nowWrites;
          		if( returnStats.initialized ) {
          			returnStats.processDiskIdleSeconds = std::max<double>(0, returnStats.elapsed - std::min<double>(returnStats.elapsed, (nowBusyTicks - (*statState)->lastBusyTicks) / 1000.0));
          			returnStats.processDiskRead = (nowReads - (*statState)->lastReads);
          			returnStats.processDiskWrite = (nowWrites - (*statState)->lastWrites);
          			returnStats.processDiskWriteSectors = (nowWriteSectors - (*statState)->lastWriteSectors);
          			returnStats.processDiskReadSectors = (nowReadSectors - (*statState)->lastReadSectors);
          		}
          		(*statState)->lastBusyTicks = nowBusyTicks;
          		(*statState)->lastReads = nowReads;
          		(*statState)->lastWrites = nowWrites;
          		(*statState)->lastWriteSectors = nowWriteSectors;
          		(*statState)->lastReadSectors = nowReadSectors;

This is the rate that reads and writes are being performed on the disk and is not specific to a single process. The sectors counter tells you how many sectors are being read or written to by those reads or writes, as each read or write may consist of multiple sectors. This comes from the same code linked above. It seems that in status, though, we report the read/write hz as a rate per second (as implied by the name), but sectors is reported as the number that occurred over a ~5 second window (the exact window size is not reported). That seems wrong to me, so we should probably change that to be a rate and/or a counter.

Read requests measure incoming requests, the other two measure metrics for completed requests. The former may differ from the latter if the cluster can’t keep up with all of the reads.

gaurav · December 7, 2018, 8:20pm

Thanks AJ! I was mistakenly running dd command against a different disk. You explanation matches the corrected test results.

Topic		Replies	Views
Help me understand this status output Using FoundationDB	12	3581	June 15, 2021
Questions on status json Running FoundationDB	6	1105	August 21, 2021
What do you monitor? Using FoundationDB	35	10014	September 26, 2022
Memory Available in FDBCLI Development	3	1046	August 25, 2018
Understanding `status json` output Using FoundationDB	1	240	August 23, 2023

Few queries on status json

Related topics