A solid 12% boost is very nice! It would be interesting to have some details on the cluster setup you’re using and the underlying benchmark here. It would also be interesting to see some resulting data from
/proc/*/smaps or something to give a breakdown on huge page usage.
Regarding #909, the problem seems to be one of internal accounting not taking 2MB pages into account, wasting space, but also unpredictable/unknown behavior/performance when a huge page allocation fails and/or “magazine” sizes are mixed. I don’t think there’s any way to quantify those things without implementing the necessary accounting and running a lot of tests.
FoundationDB currently allocates huge magazines using
MAP_HUGETLB, hence the need for this accounting. It’s unclear to me how much usage
malloc or the standard memory allocator for C++ are used in critical paths, from what I can tell, but those are separate heaps and codepaths. It would be interesting to see where most of the benefits come from. (Perhaps the upcoming BCC probe work and some bpf magic can help us one day…)
Here’s a final thought: rather than forcing transparent huge pages across the system, perhaps FoundationDB can instead call
madvise(MADV_HUGEPAGE) across the necessary address spaces. This still allows per-process THP usage across the system while removing the need to account for things – in return the huge page kernel threads are needed. I don’t know how
khugepaged picks pages, but
madvise requires anon
mmap(2) pages you allocate, so maybe it can do more on its own than
madvise can since it will recognize
mmap calls from anywhere that satisfy those criteria (including the libc allocator, which FoundationDB can’t touch.)