ansaurus

Question

Profiling C++ in the presence of aggressive inlining?

Answer 1

A:

It doesn't matter that the code is running slower (your convenience aside, of course) - the profiler will still tell you the correct proportion of time spent in each function.

anon 2010-01-18 17:26:11

another 10 up-votes for stating the obvious and adding no value -- whee :P

Hassan Syed 2010-01-18 17:30:55

Maybe -- but then again, it's entirely possible that this isn't the case. If you consider function calls as a tree, it takes for granted that all subtrees benefit equally from inlining. In reality, this is almost always going to be false. The difference will be insignificant in some cases, but quite significant in others.

Jerry Coffin 2010-01-18 17:31:02

that and most profilers are driven by hardware interrupts -- and include other noise.

Hassan Syed 2010-01-18 17:32:20

If the inlining enables better optimization of the core function (which is very possible), then profiling without the inlining will be misleading.

Novelocrat 2010-01-18 17:50:34

I don't understand why this one is being downvoted. What Neil stated is true. If you have a bottleneck, and turn inlining off, that bottleneck will be obvious in the profiler. You'll still be at least as slow as your slowest function. If your biggest slowdown is the calling overhead, you should probably pat yourself on the back and call it a day. If there really is no obvious place where the code is slow, then you have to judge whether it is worth optimizing for 2-3% improvements.

Steve 2010-01-18 17:57:30

Any profiling is of course only a rough indicator of what will happen in real, optimised code. However, the proportion of time spent in each function is likely to be fairly representative of what is happening in the real code, and provides a good first approximation of which functions would benefit from hand optimisation, which is surely the object of the exercise.

anon 2010-01-18 17:57:39

@Hassan Syed: Most profilers *do* include some noise, but it's not (even close to) unique to the situation of all the functions being generated inline -- i.e. it's true in general, not just the situation at hand.

Jerry Coffin 2010-01-18 17:58:44

I am not confident that a program compiled with inlining disabled will be equivalent, modulo overall runtime, to a program where the compiler has inlined functions like crazy: inlining can enable further optimizations to take place, such as constant propagation, loop hoisting, loop fusion, etc.

Bradford Larsen 2010-01-19 06:37:52

Good programmers use a profiler to get answers. _Really_ good programmers use a profiler to get insight.

MSalters 2010-01-19 12:01:41

Mike Dunlavey 2010-01-20 22:18:03

... Sorry to blather on, but I have a problem with the vocabulary we use to discuss performance. I would prefer we not talk about *functions* and how *slow* they are, but about *lines of code* and what *fraction of time* they *are responsible for* (or *cost*, for short). In my experience, all performance problems have a very precise location, and cost a healthy fraction of time, which is saved when they are "fixed", by removing them, invoking them less, or otherwise finding a better way to accomplish the same purpose.

Mike Dunlavey 2010-01-20 22:44:09

... and *bottleneck* - implies some limiting resource that constrains how much of our legitimate business can be done at a time. Real performance problems are like those fractal curves (http://en.wikipedia.org/wiki/Koch_snowflake) where in the middle of each direct line you take a little side-trip, and in the middle of each line of that you take another, and so on. Each side-trip (function call) seems innocent, but in aggregate they multiply the length of the trip by a large factor, unnecessarily so. Each call is like a credit card purchase, tempting you to think it doesn't cost much.

Mike Dunlavey 2010-01-21 00:42:02

... and many of them don't - they are a good thing - but some are killers and they don't advertise themselves in the source code. There's no way to guess reliably where they are. Stackshots catch them in the act. (http://stackoverflow.com/questions/406760/whats-your-most-controversial-programming-opinion/1562802#1562802). Repeat: this has **nothing** to do with inlining - the entry/exit cost of function calls could be absolutely free and perfectly optimized. It has **everything** to do with the extra semantic work you don't realize you're asking for.

Mike Dunlavey 2010-01-21 13:12:03

Answer 2

+1 A:

Develop a few macros using the high performance timing mechanism of your CPU (e.g., x86) -- the routines that don't rely on system calls, and bind a single thread running your core loop to a specific CPU (set the affinity). You would need to implement the following macro's.

PROF_INIT //allocate any variables -- probably a const char
PROF_START("name") // start a timer
PROF_STOP() // end a timer and calculate the difference -- 
            // which you write out using a async fd

I had something like this that I placed in every function I was interested in, I made sure the macro's placed the timing calls into the context of the call tree -- this is possibly the most accurate way to profile.

Note:

This method is driven by your code -- and does not rely on an external tool to snoop your code in any way. Snooping, Sampling and interrupt driven profiling is inaccurate when it comes to small sections of code. Besides, you want to control where and when the timing data is collected -- like at specific constructs in your code, like loops, the beginning of a recursive call-chain or mass memory allocations.

-- edit --

You might be interested in the link from this answer to one of my questions.

Hassan Syed 2010-01-18 17:27:01

+1. Should be worth a try at least.

Marcus Lindblom 2010-01-18 17:48:52

Answer 3

+7 A:

I assume what you want to do is find out which lines of code are costing you enough to be worth optimizing. That is very different from timing functions. You can do better than gprof.

Here's a fairly complete explanation of how to do it.

You can do it by hand, or use one of the profilers that can provide the same information, such as oprofile, and RotateRight/Zoom.

BTW, inlining is of significant value only if the routines being inlined are small and don't call functions themselves, and if the lines where they are being called are active enough of the time to be significant.

As for the order of magnitude performance ratio between debug and release build, it may be due to a number of things, maybe or maybe not the inlining. You can use the stackshot method mentioned above to find out for certain just what's going on in either case. I've found that debug builds can be slow for other reasons, like recursive data structure validation, for example.

Mike Dunlavey 2010-01-18 17:33:55

(+1) nice info, especially the stackshot link (which says the key is capturing and not sampling), although all the tools seem to be measuring rather than capturing -- stackshot too.

Hassan Syed 2010-01-18 18:17:02

@Hassan: Yeah. I think of it as a matter of emphasis, on the precision of location, rather than on the precision of measurement. Oprofile and Zoom take stack samples on wall-clock time, and they can give you, per statement, the % of samples containing that statement. If they give it from a lot of samples, that's OK but not really necessary. Personally I do it by hand, because then I can also see more state information such as the data being worked on. Often that is crucial information.

Mike Dunlavey 2010-01-18 18:22:09

Yes I am in agreement, I do this stuff by hand as well -- and consensus in profiling game engines / embedded work / OS routines and RTOS stuff pretty much uses this approach as well.

Hassan Syed 2010-01-18 19:21:15

@Hassan: No kidding? Boy, they've been awful quiet on SO. I'm sure in high performance graphics every cycle counts. I've been using this method since doing assembler on a mini in '78. Even then, the hardware guys wanted to provide me with timers for performance tuning. I said, thanks but no thanks. A "pause" button, and "single step" are not only the simplest tools, but the most effective.

Mike Dunlavey 2010-01-18 20:35:12

The profile build I was attempting was build with identical options as my release build, except for the addition of -fno-inlining and -pg.I did not explicitly inline any of my functions; I have left that to the compiler, under the assumption that it can do a better job than I can at deciding what is appropriate to inline.Because of all the inlining the compiler was doing, I was unable to find out even which functions were taking up time---they all got inlined. Profiling with inlining explicitly disabled let me get this info.Finding out which lines are expensive would also be useful.

Bradford Larsen 2010-01-19 06:31:21

@Bradford: I would just take stackshots either way. They can tell exactly what's going on. Maybe the issue is inlining, but frankly that would really surprise me. More likely it is something I would never have been able to guess. Don't think too hard about it - just do it - and see what it tells you. It's easy. I would start with the slow version. Something is costing 90% of the time, so you will see exactly what it is as soon as you pause it, with 90% certainty. Pausing it a few more times will leave no doubt.

Mike Dunlavey 2010-01-19 21:45:53

... I'll be interested to hear what you find out.

Mike Dunlavey 2010-01-19 22:25:34

@Mike: My problem was not stated as clearly as it could have been initially. My issue is that attempting to profile with the same optimization settings as my release build results in gprof telling me nothing useful: almost all the code of my inner loop gets inlined into the body, and gprof just tells me ``gee, you spend all your time in your main loop.'' See my edited question for more information.

Bradford Larsen 2010-01-20 01:18:47

@Mike: When I have more time (paper deadline in 2 days), I will try the ``stackshot'' method you suggest.In the meantime, I compiled with profiling support and with inlining explicitly disabled, but other build options the same as in my release build.When running this build, my application ran many times slower, but gprof was able to give me more useful information than ``90% of the time is spent in the main loop, which is the whole program''. It turns out a big culprit is code that empties a hash table, taking about 25% of the total run time. The rest of the profile was fairly flat.

Bradford Larsen 2010-01-20 01:23:26

@Bradford: I'm glad **gprof** gave you some insight. Stackshots should be able to give more. I just got a comment on this answer (http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024) where the user claims a 60x speedup.

Mike Dunlavey 2010-01-20 13:24:47

Answer 4

+1 A:

You can use a more powerful profiler, such as Intel's VTune, which can give you assembly line level of performance detail.

http://software.intel.com/en-us/intel-vtune/

It's for Windows and Linux, but does cost money...

Inverse 2010-01-18 17:34:44

yeh, that would do the trick -- in theory that is. those tools are horribly unreliable in windows -- they gave me BSOOD's on win2003 and xp on many machines and generally caused other odities as well.

Hassan Syed 2010-01-18 17:36:54

Answer 5

+1 A:

Would valgrind be any more helpful?

Dmitry 2010-01-18 17:41:09

Answer 6

A:

You can use gcov to give you line-by-line execution counts. This should at least tell you which inlined functions are the bottleneck.

ergosys 2010-01-18 18:11:14

I played around with gcov a little bit, at your suggestion. It gives call counts, which is not enough to tell me what's taking lots of time.

Bradford Larsen 2010-01-19 06:54:12

ansaurus

tags:

views:

answers:

Profiling C++ in the presence of aggressive inlining?

related questions