getMachineLoad and calculation of CPUSeconds for MachineMetrics

wolfspider · August 1, 2018, 4:58am

I’m trying to address an issue right now with the MachineMetrics coming out negative in FreeBSD. I found a couple of problems with the code I have so far (format is different, need to poll all cores) but just need to know if I’m looking at the right thing. From what I can tell testing in Docker and going through the code the CPUSeconds for MachineMetrics are calculated by this:

getMachineLoad(clockIdleTime, clockTotalTime);
returnStats.machineCPUSeconds = clockTotalTime - (*statState)->lastClockTotalTime != 0 ? ( 1 - ((clockIdleTime - (*statState)->lastClockIdleTime) / ((double)(clockTotalTime - (*statState)->lastClockTotalTime)))) * returnStats.elapsed : 0;

Pulling this up in Docker for VSCode I can see on my VM 4 cores are polled and values are totaled:

I’m getting a negative 3 digit number as time progresses but those numbers as uint64_t do increment over time slightly since they are a metric of time. The test code which I use (which is now just on 1 core still for simplicity’s sake) to see this in a console application is as follows:

util = (100ULL * cur[CP_USER] / (sum ? (double)sum : 1.0L));
printf("cpu user utilization: %7.3f\n", util);
util = (100ULL * cur[CP_NICE] / (sum ? (double)sum : 1.0L));
printf("cpu nice utilization: %7.3f\n", util);
util = (100ULL * cur[CP_SYS] / (sum ? (double)sum : 1.0L));
printf("cpu sys utilization: %7.3f\n", util);
util = (100ULL * cur[CP_IDLE] / (sum ? (double)sum : 1.0L));
printf("cpu idle utilization: %7.3f\n", util);

printf("User in uint64_t: %ju\n", (uint64_t)(cur[CP_USER] * 100ULL));
printf("Nice in uint64_t: %ju\n", (uint64_t)(cur[CP_NICE] * 100ULL));
printf("System in uint64_t: %ju\n", (uint64_t)(cur[CP_SYS] * 100ULL));
printf("Idle in uint64_t: %ju\n", (uint64_t)(cur[CP_IDLE] * 100ULL));

The modifier at the end is just to make it equivalent to /proc/stat and there are some other methods to make them the same but I notice that the numbers are incrementing and “system” is too large- the output is:

cpu user utilization: 0.161
cpu nice utilization: 0.000
cpu sys utilization: 0.591
cpu idle utilization: 98.974
User in uint64_t: 28900
Nice in uint64_t: 0
System in uint64_t: 106100
Idle in uint64_t: 17782400

cpu user utilization: 0.161
cpu nice utilization: 0.000
cpu sys utilization: 0.591
cpu idle utilization: 98.973
User in uint64_t: 29000
Nice in uint64_t: 0
System in uint64_t: 106200
Idle in uint64_t: 17796600

Alright, so you get the idea and working on fixing up the number conversion so it’s better but the ratios of the CPU are accurate and have definitely tested to see if that’s correct.

My question is- should CPUSeconds usually come out as 0 and how? The way I read the code is that it compares the time values and if they are different then output the calculated length of time between ProcessMetrics calls. Wouldn’t that always normally output a value? Also, is this the correct place to be testing things? I’ve tried modifying a few things here and there and seem to get the same negative 3 digit number.

wolfspider · August 2, 2018, 5:05am

I believe I may have a possible cause of the issue:

github.com

freebsd/freebsd/blob/ef0cebe7e1319abfe840cc789932f5cc0c54d175/sys/compat/linprocfs/linprocfs.c#L115


#include <compat/linux/linux.h>
#include <compat/linux/linux_mib.h>
#include <compat/linux/linux_misc.h>
#include <compat/linux/linux_util.h>
#include <fs/pseudofs/pseudofs.h>
#include <fs/procfs/procfs.h>


/*
* Various conversion macros
*/
#define T2J(x) ((long)(((x) * 100ULL) / (stathz ? stathz : hz)))	/* ticks to jiffies */
#define T2CS(x) ((unsigned long)(((x) * 100ULL) / (stathz ? stathz : hz)))	/* ticks to centiseconds */
#define T2S(x) ((x) / (stathz ? stathz : hz))		/* ticks to seconds */
#define B2K(x) ((x) >> 10)				/* bytes to kbytes */
#define B2P(x) ((x) >> PAGE_SHIFT)			/* bytes to pages */
#define P2B(x) ((x) << PAGE_SHIFT)			/* pages to bytes */
#define P2K(x) ((x) << (PAGE_SHIFT - 10))		/* pages to kbytes */
#define TV2J(x)	((x)->tv_sec * 100UL + (x)->tv_usec / 10000)


/**
* @brief Mapping of ki_stat in struct kinfo_proc to the linux state

So it looks like ticks and jiffies aren’t being converted properly. Also, different kernel versions have different 0 paddings applied to them:

github.com

freebsd/freebsd/blob/ef0cebe7e1319abfe840cc789932f5cc0c54d175/sys/compat/linprocfs/linprocfs.c#L499


linprocfs_dostat(PFS_FILL_ARGS)
{
	struct pcpu *pcpu;
	long cp_time[CPUSTATES];
	long *cp;
	struct timeval boottime;
	int i;
	char *zero_pad;
	bool has_intr = true;


	if (linux_kernver(td) >= LINUX_KERNVER(2,6,33)) {
		zero_pad = " 0 0 0 0\n";
	} else if (linux_kernver(td) >= LINUX_KERNVER(2,6,24)) {
		zero_pad = " 0 0 0\n";
	} else if (linux_kernver(td) >= LINUX_KERNVER(2,6,11)) {
		zero_pad = " 0 0\n";
	} else if (linux_kernver(td) >= LINUX_KERNVER(2,5,41)) {
		has_intr = false;
		zero_pad = " 0\n";
	} else {
		has_intr = false;

The Docker image gives me this info for uname:

root@48ceb58f7a7c:/home/applefdbvol/bin# uname -a
Linux 48ceb58f7a7c 4.9.87-linuxkit-aufs #1 SMP Wed Mar 14 15:12:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

wolfspider · August 2, 2018, 7:22am

Ok, I solved it the cause of the problem was that Idle just wasn’t being added into the total but using ‘ticks’ just like Darwin does is fine with this equation after all. For this section of the log it’s reading:

Type="MachineMetrics" ID="0000000000000000" Elapsed="5.00392" MbpsSent="0.100129" MbpsReceived="0.133521" OutSegs="579" RetransSegs="0" CPUSeconds="0.110759" TotalMemory="8117676" CommittedMemory="1649936" AvailableMemory="6467740" ZoneID="57dcf28053018043d1374767f01da056" MachineID="57dcf28053018043d1374767f01da056" Machine="192.168.64.2:4689" LogGroup="default" TrackLatestType="Original"

Idle just got left out of the total so it was going into a large negative territory. There are some other things in the logs related to the recent huge allocations but that is for another forum topic.

Topic		Replies	Views
How accurate is readOpsPerSec when readSample is enabled? Using FoundationDB	0	24	March 31, 2025
Curious about CPU utilization of Single-core Write Test #1 Using FoundationDB performance	6	586	August 8, 2022
Memory Available in FDBCLI Development	3	1046	August 25, 2018
Few queries on status json Using FoundationDB	5	1232	December 7, 2018
Understanding `fdbcli status details` output Using FoundationDB	2	645	March 19, 2021

getMachineLoad and calculation of CPUSeconds for MachineMetrics

Related topics