fble-0.5 (2025-07-13,fble-0.4-212-ga8f8ad0f)
Most fble binaries accept a --profile
option to generate a profile at
the end of a run. For example:
fble-stdio --profile foo.prof -p core -m /Core/Stdio/HelloWorld%
This will generate a file foo.prof
after the run. The generated file is
an uncompressed binary encoded google/pprof proto file. To use it with
google/pprof tools, gzip the file first.
The generated profiles can be quite large depending on the program you ran.
You can reduce the size of the generated samples using the
--profile-sample-period
option in addition to --profile
. For
example:
fble-stdio --profile foo.prof --profile-sample-period 1000 -p core -m /Core/Stdio/HelloWorld%
This behaves as if only 1 in 1000 samples was actually recorded. Increase the sample period to decrease the size of the generated profile, at the expense of some lost information for rarely exercised traces in the program.
You may need some trial and error for your use case to find a good choice of sample period that balances profile size against information loss in the profile.
fble-pprof
The fble-pprof
tool can be used to view fble generated profiles. You provide
the path to the profile and it launches a simple http server you can use to
browse the profile.
For example:
fble-pprof foo.prof
Which outputs something like:
parsing foo.prof... serving at http://localhost:8123
Visit the URL using your favorite web browser. The following different views of the profile are available from there:
Shows high level information about the profile, such as number of samples and number of sequences. Page loading performance scales with the number of sequences.
Shows a breakdown of frames by overall time spent with the frame somewhere in the call stack. Typical usage is to identify any frames with unexpectedly large overall time spent and focus on optimization there.
Shows a breakdown of frames by self time spent in the frame. Typical usage is to focus on optimization of the frames with most self time.
Shows a breakdown of full (possibly canonicalized) callstacks by time spent. Gives a rough sense of what parts of the code are taking a lot of time to focus optimization efforts on. This view assumes that sequences are already deduplicated in the profile, which is always the case for fble generated profiles, but may not be the case in general.
Viewed when selecting a frame or specific sequence of frames. Shows a breakdown of frames going into this sequence and leading from the sequence. Typical usage is to identify the most relevant callers and callees of a sequence to focus efforts on reducing those calls and/or optimizing the callees.
See the man page for fble-pprof
for more details.
You can reuse the fble profiling logic with linux perf profiles.
To do so, run you program with the -d
and -g
flags to
perf record
. For example:
perf record -F 997 -d -g fble-bench
You can then convert to an fble profile with, for example:
perf script | fble-perf-profile -o bench.prof
See the man page for fble-perf-profile
for more details.
You can adjust -F up or down if it ends up with too many samples to process.
Look for FbleNewFuncValue calls, see if they are expected. If not, use regular fble prof profiling to track down who is allocating functions unexpectedly and factor those out. A large number of function allocations is expensive and normally not necessary in a program.