I am facing a performance issue on a multi-core (8+) architecture with software written in C++ / VistualStudio / WindowsXP.
Suddenly I realized that I have no idea of the performances of my L1 and L2 cache and CPU->to->Memory bandwidth.
I have tested several tools (including VTune, Glowcode, etc, etc) but all of them fails when tested on load in a multicore architecture (which is the very reason why I need them!).
Can you suggest any other tool which is not so fancy in doing graphs but can give me at least few indications of my cache/memory performances or can suggest snippets of code to manually instrument my application?
Thanks!