I want to count several cpu instructions in my code. e.g. I would like to know how many additions, how many multiplications, how many float operations, how many branches my code executes. I currently use gprof under Linux for profiling my c++ code but it only gives the number of calls to my functions, and I manually estimate the number of instructions. Are there any tools that might do the trick for me? Maybe some virtual machine?
If you really need to count instructions then you are probably best off generating assembler and then passing the output to an intelligent grep equivalent. For gcc
, try the -S
switch.
Just out of curiosity, is instruction-count a useful way to profile code performance?
I know that back in the days of "simple" CPU designs, you could reasonably assume that each opcode would take exactly so-many-nanoseconds of CPU time to execute, but these days with all the complex memory caching schemes, on-the-fly opcode re-ordering, pipelining, superscalar architecture, and everything else that's been thrown into the modern CPU, does the simple counting of opcode executions still give one a good indication of how long the code will take to run? Or will execution time vary as much based on (for example) memory access patterns and the seequence in which opcodes are executed as it will on the raw frequency of the opcodes' execution?
My suspicion is that the only way to reliably predict code performance these days is to actually run the code on the target architecture and time it.... i.e. often when it seems like the compiler has emitted inefficient code, it's actually doing something clever that takes advantage of a subtle feature of the modern CPU architecture.
Intels vtune is free for linux users, AFAIK (assuming we're talking about an intel based x86 linux machine). It will give you all the info you need and SOOO much more.
This is a general advice, not-Linux specific: you should be interested in CPU cycles instead. Forget about the number of instructions as a measure of performance. One instructions may cost same as other 10 together, so it won't tell you anything.
You should focus on CPU cycles, and in multithreaded environments (most if not all today) in the time the thread is put to sleep ("switched/out"), which will give you the idea of how much time is waiting for I/O, DB, etc to complete and it impacts CPU privileged time.