views:

183

answers:

5

I want to count several cpu instructions in my code. e.g. I would like to know how many additions, how many multiplications, how many float operations, how many branches my code executes. I currently use gprof under Linux for profiling my c++ code but it only gives the number of calls to my functions, and I manually estimate the number of instructions. Are there any tools that might do the trick for me? Maybe some virtual machine?

+5  A: 

If you really need to count instructions then you are probably best off generating assembler and then passing the output to an intelligent grep equivalent. For gcc, try the -S switch.

Charles Bailey
+2  A: 

You may be able to use Valgrind's Callgrind with the --dump-instr=yes flag to achieve this

Hasturkun
Doesn't directly count instructions but helps browsing assembly in a very convenient way. Thanks.
Atilla Filiz
A: 

Just out of curiosity, is instruction-count a useful way to profile code performance?

I know that back in the days of "simple" CPU designs, you could reasonably assume that each opcode would take exactly so-many-nanoseconds of CPU time to execute, but these days with all the complex memory caching schemes, on-the-fly opcode re-ordering, pipelining, superscalar architecture, and everything else that's been thrown into the modern CPU, does the simple counting of opcode executions still give one a good indication of how long the code will take to run? Or will execution time vary as much based on (for example) memory access patterns and the seequence in which opcodes are executed as it will on the raw frequency of the opcodes' execution?

My suspicion is that the only way to reliably predict code performance these days is to actually run the code on the target architecture and time it.... i.e. often when it seems like the compiler has emitted inefficient code, it's actually doing something clever that takes advantage of a subtle feature of the modern CPU architecture.

Jeremy Friesner
Indeed, there are more variables then ever and an accurate prediction is hard. And there are many factors likely more important than instruction count. Nonetheless, certain relationships have been true and will remain true for the forseeable future like cost(add) <= cost(multiply) <= cost(divide) <= cost(square root). Replace a divide with a multiply is unlikely to hurt performance and may help.
George Phillips
You are correct performance-wise. However my goal is not to optimize my code for performance but do other types of analysis(instruction types and frequency).
Atilla Filiz
+1  A: 

Intels vtune is free for linux users, AFAIK (assuming we're talking about an intel based x86 linux machine). It will give you all the info you need and SOOO much more.

Goz
Like the compiler, Vtune is only free for 30-days as an "evaluation." It's listed as being $699. Quite a far fetch from "free."
greyfade
my mistake im sure it USED to be free for linux users .. or maybe im just going mad ....
Goz
+3  A: 

This is a general advice, not-Linux specific: you should be interested in CPU cycles instead. Forget about the number of instructions as a measure of performance. One instructions may cost same as other 10 together, so it won't tell you anything.

You should focus on CPU cycles, and in multithreaded environments (most if not all today) in the time the thread is put to sleep ("switched/out"), which will give you the idea of how much time is waiting for I/O, DB, etc to complete and it impacts CPU privileged time.

Ariel
+1 And memory access (cache hits/misses) can play a great role in determining actual performance while the number of operations will tell nothing about it.
sharptooth