We’ve discussed about hand-picking the segments to use THP via madvise()
, but we first wanted to see how much THP would give us without any code changes. I fully agree that madvise()
is more preferable in general. Also, we need to be more careful about memory alignment in order to fully utilize THP. Here’s an smaps
for a storage server process (not from the benchmark, but our development machine). There are 170 small or unaligned segments.
170 AnonHugePages: 0 kB
7 AnonHugePages: 2048 kB
3 AnonHugePages: 4096 kB
4 AnonHugePages: 6144 kB
1 AnonHugePages: 96256 kB
1 AnonHugePages: 223232 kB
1 AnonHugePages: 919552 kB
For the benchmark, we used a minimal triple redundancy cluster with 3 tlogs and 3 storages on 6 different i3 instances. We used our in-house C program. We have a tester workload version of the same benchmark almost ready to be merged.