Transparent Huge Pages Performance Impact

We’ve discussed about hand-picking the segments to use THP via madvise(), but we first wanted to see how much THP would give us without any code changes. I fully agree that madvise() is more preferable in general. Also, we need to be more careful about memory alignment in order to fully utilize THP. Here’s an smaps for a storage server process (not from the benchmark, but our development machine). There are 170 small or unaligned segments.

170 AnonHugePages:         0 kB
  7 AnonHugePages:      2048 kB
  3 AnonHugePages:      4096 kB
  4 AnonHugePages:      6144 kB
  1 AnonHugePages:     96256 kB
  1 AnonHugePages:    223232 kB
  1 AnonHugePages:    919552 kB

For the benchmark, we used a minimal triple redundancy cluster with 3 tlogs and 3 storages on 6 different i3 instances. We used our in-house C program. We have a tester workload version of the same benchmark almost ready to be merged.