views:

166

answers:

3

I have a profiling issue - imagine I have the following code...

void main()
{
    well_written_function();
    badly_written_function();
}
void well_written_function()
{
    for (a small number)
    {
        highly_optimised_subroutine();
    }
}
void badly_written_function()
{
    for (a wastefully and unnecessarily large number)
    {
        highly_optimised_subroutine();
    }
}
void highly_optimised_subroutine()
{
    // lots of code
}

If I run this under vtune (or other profilers) it is very hard to spot that anything is wrong. All the hotspots will appear in the section marked "// lots of code" which is already optimised. The badly_written_function() will not be highlighted in any way even though it is the cause of all the trouble.

Is there some feature of vtune that will help me find the problem?

Is there some sort of mode whereby I can find the time taken by badly_written_function() and all of its sub-functions?

A: 
Joseph Quinsey
+2  A: 

That's usually known as a "callgraph profile", and I'm fairly sure Visual Studio will do that.

Mike Seymour
+1  A: 

May I suggest my own open source profiler raven::set::cRunWatch? It is designed for exactly this problem and works on Windows using Visual Studio 2008 Standard Edition, so you do not need to pay for the version that has the profiler included.

I have taken your code, rearranged it slightly so it compiles without forward declarations and added the necessary calls to cRunWatch

// RunWatchDemo.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

void highly_optimised_subroutine()
{
    raven::set::cRunWatch runwatch("highly_optimised_subroutine");
    Sleep( 2 );
}


void badly_written_function()
{
    raven::set::cRunWatch runwatch("badly_written_function");
    for (int k = 1; k < 1000; k++ )
    {
        highly_optimised_subroutine();
    }
}

void well_written_function()
{
    raven::set::cRunWatch runwatch("well_written_function");
   for (int k = 1; k < 10; k++ )
    {
        highly_optimised_subroutine();
    }
}


int _tmain(int argc, _TCHAR* argv[])
{
raven::set::cRunWatch::Start();

    well_written_function();
    badly_written_function();

raven::set::cRunWatch::Report();

    return 0;
}   

When run this produces the output

raven::set::cRunWatch code timing profile
                    Scope   Calls       Mean (secs)     Total
highly_optimised_subroutine     1008    0.002921        2.944146
   badly_written_function        1      2.926662        2.926662
    well_written_function        1      0.026239        0.026239

This shows badly_written_function to be the very close second time user, and therefore the culprit.

You can obtain cRunWatch from here You will recognize the sample code in the User Guide :-)

ravenspoint
+1 But requires BOOST.
Joseph Quinsey
It does require boost. Almost every program I write uses boost in one way or another. Boost is free and incredibly useful.
ravenspoint
@ravenspoint : But I work on 15-year-old HPUX 10.20 platforms. Don't ask :(
Joseph Quinsey
@Joseph Quinsey Then I expect you are also missing the high-resolution performance counter, which is the beating heart of cRunWatch.
ravenspoint