Profiling FoundationDB

Due to the unique coroutine-based architecture of FoundationDB, profiling FoundationDB may be a little bit tricky sometimes. I’d like to share some methods and tips we use at Snowflake to efficiently profile FoundationDB.

Isolate the Roles
The very first thing you need to do is to isolate the roles, so you can identify which role is being the bottleneck for your workload. If you are running FoundationDB on a single machine (a development machine, for example), binding each process to a specific CPU in conjunction with using htop command or top -H -c command will greatly help you to identify the hot process.

For Example:

[fdbserver.4500]
command = /usr/bin/numactl --physcpubind=0 fdbserver
class = master

[fdbserver.4501]
command = /usr/bin/numactl --physcpubind=1 fdbserver
class = proxy

[fdbserver.4502]
command = /usr/bin/numactl --physcpubind=2 fdbserver
class = resolution

… and so on …

perf
https://perf.wiki.kernel.org/

perf is one of the most commonly used profiling tools on Linux, and it works perfectly for FoundationDB. Once you identify the hot fdbserver process, you can use perf top to take a quick look at the top functions.

perf top -p <fdbserver PID>

If you need to drill down even further, use perf record and perf report.

perf record --call-graph dwarf -o <profile.data> -p <fdbserver PID>

This will collect the CPU-cycle profile and save it as <profile.data> file. If you want to look at another metric than CPU-cycle, specify -e <event> option (For example, -e cache-misses). For the complete list of events supported on your system, run perf list.

--call-graph dwarf is an option to use DWARF´s CFI (Call Frame Information) to collect call-graph information instead of using the frame pointer (default in perf).

If your FoundationDB does not contain the debug information (e.g. Release binary), then you need to install debuginfo package globally, or extract the debuginfo data under an arbitrary location and specify --symfs argument pointing to the directory which contains your usr/lib/debug.

Once you collect the profiles, then you can use perf report to navigate through the data.

perf report -g -i <profile.data> --no-children

This will open an interactive viewer in the bottom-up view, which we found more useful to easily identify the bottlenecks. If you do not specify --no-children, perf will open a top-down view.

Tip:
If you specify --stdio, or pipe the perf report output to a file or another command, such as less, it will generate an expanded view. It is useful if you want to share the output with someone.

Tip:
--call-graph dwarf may generate very large profile data. You may need to start with a lower sampling rate by specifying --freq explicitly.

Valgrind
http://valgrind.org

Another profiler we regularly use is valgrind. valgrind is a set of various profilers, such as cachegrind (cache usage) or callgrind (CPU usage).

You cannot attach Valgrind to an already running process, so you need to start fdbserver process through valgrind manually. In order to do so, you need to comment out your process from foundationdb.conf file first. As soon as you make the change, fdbmonitor process will automatically stop the fdbserver process.

Tip:
Run “ps -ef | grep fdbserver” before stopping it, and save the exact command line, so you can copy and paste it as valgrind arguments later.

Then, run fdbserver through valgrind.

valgrind --tool=callgrind /path/to/fdbserver --cluster_file ...

Wait until fdbserver joins the cluster, then run your workload.
When the workload is finished, or you collected enough data, stop valgrind by hitting ctrl-c. It will generate a file like callgrind.out.<pid>.

Once you have the valgrind output file, you can copy it to your local machine and open it with kcachegrind or qcachegrind and visually analyze it.

Here’s a screenshot of qcachegrind on my mac.

We’d like to hear your tips too!
I didn’t describe here, but I also use Intel Pin Tool sometimes, especially when collecting instruction counts. Good old pstack is sometimes good enough (and super quick!). I’m pretty sure other people in this forum have different methods and tips. Please share little bit of your secrets too.

6 Likes

Thanks for the detailed instructions!

I’ll add another memory profiling option using gperftools here:

  1. Compile with gperftools enabled
cmake -DUSE_GPERFTOOLS=1 ../foundationdb
make
  1. Running with gperftools enabled
HEAPPROFILE=/tmp/fdbserver fdbserver [args...]

A client triggered profile dump session for server at 127.0.0.1:4500 is:

fdb> profile
ERROR: Usage: profile <client|list|flow|heap>
fdb> profile heap 127.0.0.1:4500

The server side session is like:

$ HEAPPROFILE=/tmp/fdbserver bin/fdbserver -C ../test.cluster -p 127.0.0.1:4500
Starting tracking the heap
FDBD joined cluster.
Dumping heap profile to /tmp/fdbserver.0001.heap (1024 MB allocated cumulatively, 13 MB currently in use)
Dumping heap profile to /tmp/fdbserver.0002.heap (User triggered heap dump)

The fdbserver binary has a command line flag to trigger heap dump when RSS memory is larger than a threshold. The default dump interval is every 30s (controlled by knob HEAP_PROFILER_INTERVAL), e.g.,

HEAPPROFILE=/tmp/fdbserver bin/fdbserver -C cluster_file -p 127.0.0.1:4500 --rsssize=500MiB
  1. Visualize the heap dump with pprof
pprof -call_tree -nodecount=400 -web path_to_fdb_binary/fdbserver /tmp/fdbserver.0023.heap
3 Likes

Late follow up, but just in case someone comes across this. Sometimes perf record can record more data than perf can ingest in a reasonable amount of time. Adding --no-inline to perf script, report, etc can allow perf to read in the recording in a reasonable amount of time.

1 Like