Linux C++: how to profile time wasted due to cache misses?

views:

780

answers:

+9 Q:

Linux C++: how to profile time wasted due to cache misses?

I know that I can use gprof to benchmark my code.

However, I have this problem -- I have a smart pointer that has an extra level of indirection (think of it as a proxy object).

As a result, I have this extra layer that effects pretty much all functions, and screws with caching.

Is there a way to measure the time my CPU wastes due to cache misses?

Thanks!

+8 A:

You could find a tool that accesses the CPU performance counters. There is probably a register in each core that counts L1, L2, etc misses. Alternately Cachegrind performs a cycle-by-cycle simulation.

However, I don't think that would be insightful. Your proxy objects are presumably modified by their own methods. A conventional profiler will tell you how much time those methods are taking. No profile tool would tell you how performance would improve without that source of cache pollution. That's a matter of reducing the size and structure of the program's working set, which isn't easy to extrapolate.

A quick Google search turned up boost::intrusive_ptr which might interest you. It doesn't appear to support something like weak_ptr, but converting your program might be trivial, and then you would know for sure the cost of the non-intrusive ref counts.

Potatoswatter 2010-03-21 11:39:14

Indeed it's not possible to use a `weak_ptr` with an Intrusive Counter as an intrusive counter is destroyed with the object... and so the `weak_ptr` has no way to check whether or not the object is valid without actually accessing it.

Matthieu M. 2010-03-21 13:07:08

@Matthieu: If the dependency graph is known to be a single cycle, I think you can use each object's link (there must be only one) as a validity flag. For the purpose of destruction anyway. Traversing a random graph would require thread-local storage, but that's not impossible.

Potatoswatter 2010-03-21 14:30:51

+1 A:

My advice would be to use PTU (Performance Tuning Utility) from Intel.

This utility is the direct descendant of VTune and provide the best available sampling profiler available. You'll be able to track where the CPU is spending or wasting time (with the help of the available hardware events), and this with no slowdown of your application or perturbation of the profile. And of course you'll be able to gather all cache line misses events you are looking for.

Fabien Hure 2010-03-21 11:47:02

Problem is, cache pollution will cause misses all over the place. What pattern is there to look for?

Potatoswatter 2010-03-21 11:50:44

First thing you need to find out is: Is there really a problem in your particular application. Profile your application as a user would use it, then check in the report where your bottlenecks are located.You might find a high number of L2 Cache Line Miss, but it might be caused by other parts of your application and discover other issues that you were not worried about.It does not mean that you do not have a problem with your smart pointers, but it is hidden behind more pressing bottlenecks.Tell me how it goes I'd be happy to help on any performance issue.

Fabien Hure 2010-03-21 15:45:55

+4 A:

You could try cachegrind and it's front-end kcachegrind.

Taavi 2010-03-21 11:53:51

Cachegrind works very well.

caf 2010-03-21 21:57:06

+3 A:

It depends on what OS and CPU you are using. E.g. for Mac OS X and x86 or ppc, Shark will do cache miss profiling. Ditto for Zoom on Linux.

Paul R 2010-03-21 11:54:21

+1 A:

If you're running an AMD processor, you can get CodeAnalyst, apparently free as in beer.

http://developer.amd.com/cpu/codeanalyst/codeanalystlinux/Pages/default.aspx

Arthur Kalliokoski 2010-03-21 12:21:25

+1 A:

Another tool for CPU performance counter-based profiling is oprofile. You can view its results using kcachegrind.

jpalecek 2010-03-21 12:28:06

+1 A:

Here's kind of a general answer.

For example, if your program is spending, say, 50% of it's time in cache misses, then 50% of the time when you pause it the program counter will be at the exact locations where it is waiting for the memory fetches that are causing the cache misses.

Mike Dunlavey 2010-03-23 01:32:47

ansaurus

tags:

views:

answers:

Linux C++: how to profile time wasted due to cache misses?

related questions