You should definately try GNU Gprof. As it's a code profiler, its shows you which parts of the program take the most execution time.
Basically you start off by compiling your program with an extra parameters -pg
and -g
, which enables debug-symbols and generates extra code for the profiler. You must also use this when linking. For example
g++ -g -pg -o example example.cc
When you run the program it creates a file called gmon.out
to the current working directory. This is the file that includes the analysis of the program. This file is then analyzed with gprof
which creates an output file that shows the most relevant information.
gprof example > example.out
This creates a flat profile of the program which looks like this:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
37.50 0.15 0.15 48000 3.12 3.12 Life::neighbor_count(int, int)
17.50 0.22 0.07 _IO_do_write
10.00 0.26 0.04 __overflow
7.50 0.29 0.03 _IO_file_overflow
7.50 0.32 0.03 _IO_putc
5.00 0.34 0.02 12 1666.67 14166.67 Life::update(void)
5.00 0.36 0.02 stdiobuf::overflow(int)
5.00 0.38 0.02 stdiobuf::sys_write(char const *, int)
2.50 0.39 0.01 ostream::operator<<(char)
2.50 0.40 0.01 internal_mcount
0.00 0.40 0.00 12 0.00 0.00 Life::print(void)
0.00 0.40 0.00 12 0.00 0.00 to_continue(void)
0.00 0.40 0.00 1 0.00 0.00 Life::initialize(void)
0.00 0.40 0.00 1 0.00 0.00 instructions(void)
0.00 0.40 0.00 1 0.00 170000.00 main
In this example (taken from here) the biggest bottleneck of the program is obviously Life::neighbor_count().
Gprof also creates a call graph of the functions.
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 2.50% of 0.40 seconds
index % time self children called name
0.02 0.15 12/12 main [2]
[1] 42.5 0.02 0.15 12 Life::update(void) [1]
0.15 0.00 48000/48000 Life::neighbor_count(int, int) [4]
-----------------------------------------------
0.00 0.17 1/1 _start [3]
[2] 42.5 0.00 0.17 1 main [2]
0.02 0.15 12/12 Life::update(void) [1]
0.00 0.00 12/12 Life::print(void) [13]
0.00 0.00 12/12 to_continue(void) [14]
0.00 0.00 1/1 instructions(void) [16]
0.00 0.00 1/1 Life::initialize(void) [15]
-----------------------------------------------
[3] 42.5 0.00 0.17 _start [3]
0.00 0.17 1/1 main [2]
-----------------------------------------------
0.15 0.00 48000/48000 Life::update(void) [1]
[4] 37.5 0.15 0.00 48000 Life::neighbor_count(int, int) [4]
-----------------------------------------------
Again, here you can see that neighbor_count(int,int) takes 37,5% of the execution time.
The best way to get useful information is to create versatile input files, which load the program in different ways. This will give you a wider perspective to the program's capabilities. For example a program that parses XML-files isn't really tested properly if the XML-files used for testing are too homogenous. You should use input files that corresponds to the normal usage of the program (command-line arguments etc.). You should also check the worst case possible, which could give you some valuable information for code optimization.
For more information about GNU Gprof, check out these sites:
http://sourceware.org/binutils/docs/gprof
http://cplusplusworld.com/gnugprof.html
http://www.network-theory.co.uk/docs/gccintro/gccintro_80.html