Profiling Guide

fble-0.5 (2025-07-13,fble-0.4-212-ga8f8ad0f)

Fble Based Profiling

Most fble binaries accept a --profile option to generate a profile at the end of a run. For example:

fble-stdio --profile foo.prof -p core -m /Core/Stdio/HelloWorld%

This will generate a file foo.prof after the run. The generated file is an uncompressed binary encoded google/pprof proto file. To use it with google/pprof tools, gzip the file first.

Generating Smaller Profiles

The generated profiles can be quite large depending on the program you ran. You can reduce the size of the generated samples using the --profile-sample-period option in addition to --profile. For example:

fble-stdio --profile foo.prof --profile-sample-period 1000 -p core -m /Core/Stdio/HelloWorld%

This behaves as if only 1 in 1000 samples was actually recorded. Increase the sample period to decrease the size of the generated profile, at the expense of some lost information for rarely exercised traces in the program.

You may need some trial and error for your use case to find a good choice of sample period that balances profile size against information loss in the profile.

Viewing Profiles in fble-pprof

The fble-pprof tool can be used to view fble generated profiles. You provide the path to the profile and it launches a simple http server you can use to browse the profile.

For example:

fble-pprof foo.prof

Which outputs something like:

parsing foo.prof...
serving at http://localhost:8123

Visit the URL using your favorite web browser. The following different views of the profile are available from there:

Overview

Shows high level information about the profile, such as number of samples and number of sequences. Page loading performance scales with the number of sequences.

Overall

Shows a breakdown of frames by overall time spent with the frame somewhere in the call stack. Typical usage is to identify any frames with unexpectedly large overall time spent and focus on optimization there.

Self

Shows a breakdown of frames by self time spent in the frame. Typical usage is to focus on optimization of the frames with most self time.

Sequences

Shows a breakdown of full (possibly canonicalized) callstacks by time spent. Gives a rough sense of what parts of the code are taking a lot of time to focus optimization efforts on. This view assumes that sequences are already deduplicated in the profile, which is always the case for fble generated profiles, but may not be the case in general.

Sequence

Viewed when selecting a frame or specific sequence of frames. Shows a breakdown of frames going into this sequence and leading from the sequence. Typical usage is to identify the most relevant callers and callees of a sequence to focus efforts on reducing those calls and/or optimizing the callees.

See the man page for fble-pprof for more details.

Using Linux Perf

You can reuse the fble profiling logic with linux perf profiles. To do so, run you program with the -d and -g flags to perf record. For example:

perf record -F 997 -d -g fble-bench

You can then convert to an fble profile with, for example:

perf script | fble-perf-profile -o bench.prof

See the man page for fble-perf-profile for more details.