All the .net profilers I know don’t take into the account the effect of the CPU cache.
Given that reading a field from the CPU cache can be 100 faster than reading it from main memory, it can be a big factor. (I just had to explain this in an answer)
I have seen too many people spend a long timer speeding up loops that a profiler says are slow, when in real life the cpu cache makes them fast.
E.g I wish to be able to see if a data access is missing the cpu cache a lot as well as just getting basic profiling results I can trust more.
In the past I have found that by making my data more compacted it would all fit in the CPU cache, or changing the other the data is access in can have a big effect. E.g.
AccessArrarFromStartAndDoSomething()
AccessArrayFromEndAndDoSomethingElse()
Is better then
AccessArrarFromStartAndDoSomething()
AccessArrayStartEndAndDoSomethingElse()
if the array will not fit in the CPU Cache, but it is very hard to find that type of inprovment.
Spending more cpu cycles to make the data smaller so it fits in the CPU Cache better can spread up a lot of systems, but most profilers will point you in the other direction.