views:

48

answers:

2

All the .net profilers I know don’t take into the account the effect of the CPU cache.

Given that reading a field from the CPU cache can be 100 faster than reading it from main memory, it can be a big factor. (I just had to explain this in an answer)

I have seen too many people spend a long timer speeding up loops that a profiler says are slow, when in real life the cpu cache makes them fast.


E.g I wish to be able to see if a data access is missing the cpu cache a lot as well as just getting basic profiling results I can trust more.

In the past I have found that by making my data more compacted it would all fit in the CPU cache, or changing the other the data is access in can have a big effect. E.g.

AccessArrarFromStartAndDoSomething()  
AccessArrayFromEndAndDoSomethingElse()

Is better then

AccessArrarFromStartAndDoSomething()  
AccessArrayStartEndAndDoSomethingElse()

if the array will not fit in the CPU Cache, but it is very hard to find that type of inprovment.


Spending more cpu cycles to make the data smaller so it fits in the CPU Cache better can spread up a lot of systems, but most profilers will point you in the other direction.

A: 

I may be misunderstanding your question, but I think the answer is simply to switch your profiler into a high-accuracy, low-detail mode. An example would be using ANTS Performance Profiler's new Sampling Mode:

http://www.simple-talk.com/community/blogs/andrewh/archive/2009/11/13/76420.aspx

Mel Harbour
Thanks, Sampling Mode has not been in most .not profilers until now.
Ian Ringrose
Yeah, see, that's where I would go the other way.
Mike Dunlavey
A: 

I have seen too many people spend a long timer speeding up loops that a profiler says are slow, when in real life the cpu cache makes them fast.

Some profilers are really good at nonsense like that.

What's your overall goal? Do you want the computations to complete in less wall-clock time?

If not, ignore this answer.

If so, you need to know what's causing wall-clock time to be spent that you can get rid of.

It's not about accuracy of timing. It's about accuracy of location. I suggest what you really need to know is which lines of code are both 1) responsible for a reasonable fraction of time being spent, and 2) that could be done better or not at all. That's what you need to know because if there are no such lines of code, then what are you going to optimize?

An excellent way to find such lines of code is any profiler that 1) takes samples, on wall-clock time (not cpu-time) of the call stack, and 2) tells you, for each line of code (not function) that appears on call stacks, what percent of stacks it appears on. Your candidate lines for optimization are among the lines having a large percent. (A couple non-.net examples: Zoom and LTProf.)

Frankly, the profiler I use is one you already have. I just pause the program while it's being slow and look at the stack. I don't need a lot of samples. In fact, if there's a line of code I could do without, if it appears on as few as two samples, I know it's worth fixing, and the fewer samples it took to get to that point, the bigger it is. Here's a more thorough explanation.

There are almost always multiple "bottlenecks". So I find a big one, fix it, and do it all again. What fixing a bottleneck does to the remaining bottlenecks is - it makes them bigger. This "magnification effect* allows you to keep going until there is simply no more speed to squeeze out.

Mike Dunlavey