views:

110

answers:

1

Modern processors use all sorts of tricks to bridge the gap between the large speed of their processing elements and the tardiness of the external memory. In performance-critical applications the way you structure your code can often have a considerable influence on its efficiency. For instance, researchers using the SLO analyzer were able to fix cache locality problems and double the execution speed of several SPEC2000 benchmark programs. I'm looking for recommendations for an open source tool that utilizes a processor's performance monitoring support to locate and analyze architectural inefficiencies, such as cache misses, branch mispredicts, front end stalls, cache pollution through address aliasing, long latency instructions, and TLB misses. I'm aware of Intel's VTune (commercial), AMD's CodeAnalysist (free, but not open source), and Cachegrind (relies on simulation).

+1  A: 

For linux, oprofile works well. Actually AMD's CodeAnalysist uses oprofile as its backend.

Oprofile uses processor's intenal performance tunning mechanism to analyze architectural inefficiency.

arsane
Great! Oprofile's opcontrol --list-events lists indeed nets of events one would want to monitor.
Diomidis Spinellis