views:

677

answers:

7

I just asked a question related to how the compiler optimizes certain C++ code, and I was looking around SO for any questions about how to verify that the compiler has performed certain optimizations. I was trying to look at the assembly listing generated with g++ (g++ -c -g -O2 -Wa,-ahl=file.s file.c) to possibly see what is going on under the hood, but the output is too cryptic to me. What techniques do people use to tackle this problem, and are there any good references on how to interpret the assembly listings of optimized code or articles specific to the GCC toolchain that talk about this problem?

+3  A: 

Not gcc, but when debugging in Visual Studio you have the option to intersperse assembly and source, which gives a good idea of what has been generated for what statement. But sometimes it's not quite aligned correctly.

The output of the gcc tool chain and objdump -dS isn't at the same granularity. This article on getting gcc to output source and assembly has the same options as you are using.

Pete Kirkham
Are you sure you can use interspersed assembly and code when you're compiling with optimizations? I remember once having a problem with that.
Edan Maor
I don't have a windows machine where I am at the moment, so can't confirm, but don't recall it being an issue.
Pete Kirkham
Yes, it works even with optimizations. Of course, there is no longer a strict 1 to 1 mapping, but the compiler still does its best to intersperse the source code, and in general it works pretty well.
jalf
+10  A: 

GCC's optimization passes work on an intermediary representation of your code in a format called GIMPLE.

Using the -fdump-* family of options, you can ask GCC to output intermediary states of the tree.

For example, feed this to gcc -c -fdump-tree-all -O3

unsigned fib(unsigned n) {
    if (n < 2) return n;
    return fib(n - 2) + fib(n - 1);
}

and watch as it gradually transforms from simple exponential algorithm into a complex polynomial algorithm. (Really!)

ephemient
That's a neat trick. If only I understood the output!
Victor Liu
+1 coooooooooooooooool!
Autopulated
A: 

Victor, in your case the optimization you are looking for is just a smaller allocation of local memory on the stack. You should see a smaller allocation at function entry and a smaller deallocation at function exit if the space used by the empty class is optimized away.

As for the general question, I've been reading (and writing) assembly language for more than (gulp!) 30 years and all I can say is that it takes practice, especially to read the output of a compiler.

Richard Pennington
+1  A: 

Instead of trying to read through an assembler dump, run your program inside a debugger. You can pause execution, single-step through instructions, set breakpoints on the code you want to check, etc. Many debuggers can display your original C code alongside the generated assembly so you can more easily see what the compiler did to optimize your code.

Also, if you are trying to test a specific compiler optimization you can create a short dummy function that contains the type of code that fits the optimization you are interested in (and not much else, the simpler it is the easier the assembly is to read). Compile the program once with optimizations on and once with them off; comparing the generated assembly code for the dummy function between builds should show you what the compiler's optimizers did.

bta
+1  A: 

Adding the -L option (eg, gcc -L -ahl) may provide slightly more intelligible listings.

The equivalent MSVC option is /FAcs (and it's a little better because it intersperses the source, machine language, and binary, and includes some helpful comments).


About one third of my job consists of doing just what you're doing: juggling C code around and then looking at the assembly output to make sure it's been optimized correctly (which is preferred to just writing inline assembly all over the place).

Game-development blogs and articles can be a good resource for the topic since games are effectively real-time applications in constant memory -- I have some notes on it, so does Mike Acton, and others. I usually like to keep Intel's instruction set reference up in a window while going through listings.

The most helpful thing is to get a good ground-level understanding of assembly programming generally first -- not because you want to write assembly code, but because having done so makes reading disassembly much easier. I've had a hard time finding a good modern textbook though.

Crashworks
+2  A: 

A useful technique is to run the code under a good sampling profiler, e.g. Zoom under Linux or Shark under Mac OS X. These profilers not only show you the hotspots in your code but also map source code to disassembled object code. Highlighting a source line shows the (not necessarily contiguous) lines of generated code that map to the source line (and vice versa). Online opcode references and optimization tips are a nice bonus.

Paul R
Zoom ( http://rotateright.com ) can also statically analyze an ELF binary, so you don't even need to run/profile the code in order to just verify that the compiler has generated the asm you expected.
XWare
@XWare: good point - and Shark on Mac OS X also has this capability
Paul R
A: 

Zoom from RotateRight ( http://rotateright.com ) is mentioned in another answer, but to expand on that: it shows you the mapping of source to assembly in what they call the "code browser". It's incredibly handy even if you're not an asm expert because they have also integrated assembly documentation into the app. And the assembly listing is annotated with comments and timing for several CPU types.

You can just open your object or executable file with Zoom and take a look at what the compiler has done with your code.

JanePhanie