views:

50

answers:

2

I've got an application that does few computational CPU work, but mostly memory accesses (allocating objects and moving them around, there's few numeric or arithmetic code).

How can I measure the share of the time that am I spending in memory access latencies (due to cache misses), with the CPU being idle?

I should note that the app is running on a Hyper-V guest; I'm not sure it will pose any difficulties, but it might.

+1  A: 

Unless you have a latency built into the system, just run the application for some time on a dedicated machine and check the CPU counters. If the app uses 100% of the CPU core it can access, it's CPU bound. Otherwise, it spends time on other things like memory allocations and IOs.

Stephane
So, are memory access latencies not included in CPU usage performance counters?
jkff
I don't think they're specifically *excluded*, but that sort of granularity is hard to pin down. Your process gets the CPU for a fixed amount of time, but waiting on cache loads still counts as CPU "usage", since your process hasn't been evicted. I don't know that it's possible to really get hard numbers for memory wait time without snooping the bus.
TMN
With `Callgrind` you can get the counters for `L1` and `L2` cache misses. Aggregate this data with profiling info and you get an estimate of how well the data access patterns for your application perform.
the_void
+2  A: 

You could always profile your application to see where it spends most of the time.

You can learn a lot about your application's behaviour and data access patterns this way.

If you are using Linux, you have a wide range of available tools for profiling, like:

EDIT:

For a more exact measurement of the processor performance as well as memory accesses, you could also try the AMD CodeAnalyst Performance Analyzer. Here are instructions on how to use it with Intel processors, though I haven't tried it myself.

Another tool that you might also find useful is the Intel Performance Tuning Utility.

the_void
You see, I've already profiled the application. It has a reasonably flat performance profile with much of the time taken by various memory-accessing functions like memset, memmove etc. (actually this is a .NET application running on Windows, and these functions are called by the CLR during object construction). I'd like to know whether their time is dominated by CPU or by the memory bus.
jkff
You could try **AMD CodeAnalyst Performance Analyzer**: http://developer.amd.com/cpu/codeanalyst/Pages/default.aspx or http://www.virtualdub.org/blog/pivot/entry.php?id=288
the_void