FoundationDB

Memory Available in FDBCLI


(Jesse Bennett) #1

I’m trying to track down where in fdbcli the memory available per process is being read from at the moment. It was showing up before but some things have changed a little and now see that this isn’t being read in the cli. The memory totals are reading correctly in the logs as far as I can tell. I don’t have too much experience knowing how the CLI is getting this data- where should I focus on finding the right place where it’s getting this metric from the server?


(A.J. Beamon) #2

The cli is computing and logging this number starting here:

It’s using the status json document queried from the cluster to extract the ‘cluster.processes.memory.available_bytes’ metric. You can see that metric being put into the status document here:

To generate that metric, it’s reading data that’s being logged by two events, “AvailableMemory” from “MachineMetrics”:

and “Memory” from “ProcessMetrics”:

The actual memory statistics (and others) are gathered in this function:


(Jesse Bennett) #3

Ok, Thanks! Yeah so far in the logs things look fine (where memory is concerned) from the last run I did these are the entries in the logs- I’ll keep looking around but this will help a lot:

<Event Severity="10" Time="1535003807.959573" Type="ProcessMetrics" ID="0000000000000000" Elapsed="5.00313" CPUSeconds="0.143641" MainThreadCPUSeconds="0.133873" UptimeSeconds="4111.55" Memory="17934" ResidentMemory="4591" UnusedAllocatedMemory="262144" MbpsSent="0.000460512" MbpsReceived="0.000307008" DiskTotalBytes="7074238464" DiskFreeBytes="3187154944" DiskQueueDepth="0" DiskIdleSeconds="5.00313" DiskReads="0" DiskWrites="91" DiskReadsCount="8241" DiskWritesCount="239116" DiskWriteSectors="0" DiskReadSectors="2032" FileWrites="44" FileReads="0" CacheReadBytes="0" CacheFinds="0" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="0" CacheReads="0" ZoneID="7d8f8d4d56c4c20b4d4bb28ff9697649" MachineID="7d8f8d4d56c4c20b4d4bb28ff9697649" AIOSubmitCount="0" AIOCollectCount="0" AIOSubmitLag="0" AIODiskStall="0" CurrentConnections="2" ConnectionsEstablished="0" ConnectionsClosed="0" ConnectionErrors="0" Machine="192.168.64.2:4689" LogGroup="default" TrackLatestType="Original" />

<Event Severity="10" Time="1535003807.959573" Type="MachineMetrics" ID="0000000000000000" Elapsed="5.00313" MbpsSent="0.103174" MbpsReceived="0.139324" OutSegs="428" RetransSegs="0" CPUSeconds="0.0634311" TotalMemory="8117676" CommittedMemory="1942728" AvailableMemory="6174948" ZoneID="7d8f8d4d56c4c20b4d4bb28ff9697649" MachineID="7d8f8d4d56c4c20b4d4bb28ff9697649" Machine="192.168.64.2:4689" LogGroup="default" TrackLatestType="Original" />

(Jesse Bennett) #4

Ok, I figured out what was wrong the memory is actually read in as bytes I made an edit recently to divide page size by 1024 so the memory totals looked right in the logs but it is expecting bytes for fdbcli after all. The json from status was showing everything too. I didn’t know about that before that’s pretty cool. So I tested fdbcli like this:

int64_t minMemoryAvailable = std::numeric_limits<int64_t>::max();
				for (auto proc : processesMap.obj()) {
					StatusObjectReader process(proc.second);
					int64_t availBytes;
					if (process.get("memory.available_bytes", availBytes)) {
						printf("process get memory.available_bytes!\n");
						minMemoryAvailable = std::min(minMemoryAvailable, availBytes);
						printf("minMemory:  %ju\n",minMemoryAvailable);
					}
				}

And the output was:

fdb> status

process get memory.available_bytes!
minMemory:  6619982

Heh, I think when I saw bytes with an exclamation mark I was like wait hold on??
The rest of the code calculates it as follows:

if (minMemoryAvailable < std::numeric_limits<int64_t>::max()) {
double worstServerGb = minMemoryAvailable / (1024.0 * 1024 * 1024);
outputString += "\n  Memory availability    - ";
outputString += format("%.1f GB per process on machine with least available", worstServerGb);
outputString += minMemoryAvailable < 4294967296 ? "\n                           >>>>> (WARNING: 4.0 GB recommended) <<<<<" : "";
}

6619982 / (1024.0 * 1024 * 1024) is equal to 0.000616533…

So that was effectively zero, what I did was rollback that change and now I get the correct total again:

int status;

u_int page_size;
u_int free_count;
u_int active_count;
u_int inactive_count;
u_int wire_count;

size_t uint_size;

uint_size = sizeof(page_size);

status = sysctlbyname("vm.stats.vm.v_page_size", &page_size, &uint_size, NULL, 0);
if (status < 0){
    TraceEvent(SevError, "GetMachineMemInfo").GetLastError();
    throw platform_error();
} 

status = sysctlbyname("vm.stats.vm.v_free_count", &free_count, &uint_size, NULL, 0);
if (status < 0){
    TraceEvent(SevError, "GetMachineMemInfo").GetLastError();
    throw platform_error();
}

status = sysctlbyname("vm.stats.vm.v_active_count", &active_count, &uint_size, NULL, 0);
if (status < 0){
    TraceEvent(SevError, "GetMachineMemInfo").GetLastError();
    throw platform_error();
}

status = sysctlbyname("vm.stats.vm.v_inactive_count", &inactive_count, &uint_size, NULL, 0);
 if (status < 0){
    TraceEvent(SevError, "GetMachineMemInfo").GetLastError();
    throw platform_error();
}

status = sysctlbyname("vm.stats.vm.v_wire_count", &wire_count, &uint_size, NULL, 0);
if (status < 0){
    TraceEvent(SevError, "GetMachineMemInfo").GetLastError();
    throw platform_error();
}

memInfo.total = (int64_t)((free_count + active_count + inactive_count + wire_count) * (u_int64_t)(page_size));
memInfo.available = (int64_t)(free_count * (u_int64_t)(page_size));
memInfo.committed = memInfo.total - memInfo.available;

Alright, glad that one was easy to fix- thanks again!