views:

321

answers:

8

Hi everyone,

I am really not sure if such a tool exits but let me describe it in few words. I want to measure the performance of a small library I am writing. Basically the library does arbitrary precision arithmetic. I would like to compare the performance of the library against available libraries and get the statistics of the tests. The way I am thinking of such a tool is to be a library. Something like Boost.Test where I put my piece of code inside some block, and it will generate a report to the standard stream or whatever. Example?

PERFORMANCE_ANALYSIS_MODULE(some_library_name)

TIME_CODE(factorial_of_1000000)
{
  // code goes here
}
TIME_CODE(phi_of_5485484848418548484418489664848464847)
{


}
...

I am just looking for portable timing tools for simple performance analysis, so no worries about memory analysis for example. Please note that I would profile the library using some profiler, but what I am talking about would give a portable solution to measure the overall performance anywhere the library could be used without worrying about the used compiler/profiler.

Thanks

+2  A: 
nsanders
That is an option, but I would prefer to use something that is already used and tested. I could of course hack some macros to do the job, but that would also prevent me from working on the library in my free time until I get something done. Thanks for the suggestion.
AraK
@AraK: What's unused or untested about <ctime> ? It's part of the standard library!
Billy ONeal
@Billy ONeal Oh, not the standard library! I meant if I build my own library on <ctime> it would take time that I can use else where.
AraK
+2  A: 

I believe Wikipedia almost always renders a good start: http://en.wikipedia.org/wiki/List_of_performance_analysis_tools

If you're on Linux, you probably don't want to miss this: http://www.mostang.com/~davidm/papers/expo97/paper/doc004.html

karlphillip
+2  A: 

I don't know of such a thing as a tool, though I've seen benchmarks that included a framework pretty similar to what you're talking about (e.g., the old bench++ benchmark suite -- though unfortunately, that's old enough that I can't find a download link at the moment).

Jerry Coffin
+2  A: 

I'm using AMD CodeAnalyst, it's an easy to use profiler, and it's free.

Hernán
+7  A: 

You should definately try GNU Gprof. As it's a code profiler, its shows you which parts of the program take the most execution time.

Basically you start off by compiling your program with an extra parameters -pg and -g, which enables debug-symbols and generates extra code for the profiler. You must also use this when linking. For example

g++ -g -pg -o example example.cc

When you run the program it creates a file called gmon.out to the current working directory. This is the file that includes the analysis of the program. This file is then analyzed with gprof which creates an output file that shows the most relevant information.

 gprof example > example.out

This creates a flat profile of the program which looks like this:

  Flat profile:

  Each sample counts as 0.01 seconds.
    %   cumulative   self              self     total
   time   seconds   seconds    calls  us/call  us/call  name
   37.50      0.15     0.15    48000     3.12     3.12  Life::neighbor_count(int, int)
   17.50      0.22     0.07                             _IO_do_write
   10.00      0.26     0.04                             __overflow
    7.50      0.29     0.03                             _IO_file_overflow
    7.50      0.32     0.03                             _IO_putc
    5.00      0.34     0.02       12  1666.67 14166.67  Life::update(void)
    5.00      0.36     0.02                             stdiobuf::overflow(int)
    5.00      0.38     0.02                             stdiobuf::sys_write(char const *, int)
    2.50      0.39     0.01                             ostream::operator<<(char)
    2.50      0.40     0.01                             internal_mcount
    0.00      0.40     0.00       12     0.00     0.00  Life::print(void)
    0.00      0.40     0.00       12     0.00     0.00  to_continue(void)
    0.00      0.40     0.00        1     0.00     0.00  Life::initialize(void)
    0.00      0.40     0.00        1     0.00     0.00  instructions(void)
    0.00      0.40     0.00        1     0.00 170000.00  main

In this example (taken from here) the biggest bottleneck of the program is obviously Life::neighbor_count().

Gprof also creates a call graph of the functions.

                         Call graph (explanation follows)


    granularity: each sample hit covers 4 byte(s) for 2.50% of 0.40 seconds

    index % time    self  children    called     name
                    0.02    0.15      12/12          main [2]
    [1]     42.5    0.02    0.15      12         Life::update(void) [1]
                    0.15    0.00   48000/48000       Life::neighbor_count(int, int) [4]
    -----------------------------------------------
                    0.00    0.17       1/1           _start [3]
    [2]     42.5    0.00    0.17       1         main [2]
                    0.02    0.15      12/12          Life::update(void) [1]
                    0.00    0.00      12/12          Life::print(void) [13]
                    0.00    0.00      12/12          to_continue(void) [14]
                    0.00    0.00       1/1           instructions(void) [16]
                    0.00    0.00       1/1           Life::initialize(void) [15]
    -----------------------------------------------

    [3]     42.5    0.00    0.17                 _start [3]
                    0.00    0.17       1/1           main [2]
    -----------------------------------------------
                    0.15    0.00   48000/48000       Life::update(void) [1]
    [4]     37.5    0.15    0.00   48000         Life::neighbor_count(int, int) [4]
    -----------------------------------------------

Again, here you can see that neighbor_count(int,int) takes 37,5% of the execution time.

The best way to get useful information is to create versatile input files, which load the program in different ways. This will give you a wider perspective to the program's capabilities. For example a program that parses XML-files isn't really tested properly if the XML-files used for testing are too homogenous. You should use input files that corresponds to the normal usage of the program (command-line arguments etc.). You should also check the worst case possible, which could give you some valuable information for code optimization.

For more information about GNU Gprof, check out these sites:

http://sourceware.org/binutils/docs/gprof

http://cplusplusworld.com/gnugprof.html

http://www.network-theory.co.uk/docs/gccintro/gccintro_80.html

vtorhonen
The questioner's 2nd paragraph seems to rule out gprof.
nsanders
+1  A: 

Personally, I'm using gprof whenever I can, but seeing as this isn't an option for you I would suggest PAPI. It's a largely CPU-independent and portable library for performance-counter measurements. Recent versions support the Linux Performance Events infrastructure, so you definitely don't have to go through the trouble of recompiling a kernel to use it, if your distro supports it. Besides complete profiling information, you can count occurrences of CPU events such as cache misses and branch mispredictions.

It's not plagued by the problems of rdtsc, it can perform per-thread measurements and has some pretty neat statistical profiling modes (albeit low-level). It has wrappers for C++, naturally, so you're clear on that.

I'm not clear on its support for Windows, but I think they restarted support on 32-bit versions since 3.7.0. You might want to confirm, if that's what you work on.

You can find it on http://icl.cs.utk.edu/papi/.

Michael Foukarakis
Of the answers this sounds most like what the OP actually asked for.
Zack
A: 

on x86 performance can be portable measured manually using TSC.

vitaly.v.ch
Actually it can't; on modern processors the TSC may stop counting in sleep states, may not be synchronized between cores, and doesn't flush the instruction queue or even act as a reorder barrier, making anything it tells you suspect.
Zack
+2  A: 

I use the Boost.Test framework, with the addition of a "scoped_timer" class (.h | .cpp) to log out how long something took. Most tests include some set-up code you don't want to measure, so you invariably want to time just some inner part of each test rather than the whole thing.

For example, here I wanted to benchmark the performance of various 3D transform routines in terms of millions-of-points-per-second.

Some more comments on this sort of testing here.

I can't recommend reusing an existing unittest framework for this sort of thing strongly enough; if you go down the route of developing a new "performance test framework", you'll actually just end up mainly reinventing the wheel.

timday
Thanks. That's exactly what I am looking for :)
AraK