tags:

views:

203

answers:

6

Hi All,

Wanting to see the output of the compiler (in assembly) for some C code, I wrote a simple program in C and generated its assembly file using gcc.

The code is this:

#include <stdio.h>  

int main()  
{  
    int i = 0;

    if ( i == 0 )
    {
        printf("testing\n");
    }

    return 0;  
}  

The generated assembly for it is here (only the main function):

_main:  
pushl   %ebpz  
movl    %esp, %ebp  
subl    $24, %esp  
andl    $-16, %esp  
movl    $0, %eax  
addl    $15, %eax  
addl    $15, %eax  
shrl    $4, %eax  
sall    $4, %eax  
movl    %eax, -8(%ebp)  
movl    -8(%ebp), %eax  
call    __alloca  
call    ___main  
movl    $0, -4(%ebp)  
cmpl    $0, -4(%ebp)  
jne L2  
movl    $LC0, (%esp)  
call    _printf  
L2:  
movl    $0, %eax  
leave  
ret  

I am at an absolute loss to correlate the C code and assembly code. All that the code has to do is store 0 in a register and compare it with a constant 0 and take suitable action. But what is going on in the assembly ?

Thanks in advance.

+1  A: 

You need some knowledge about Assembly Language to understand assembly garneted by C compiler.

This tutorial might be helpful

Upul
I an aware of the basic assembly syntax... but why is the assembly code doing so many things is what I want to understand...
puffadder
+3  A: 

Don't worry about the preamble/postamble - the part you're interested in is:

movl    $0, -4(%ebp)  
cmpl    $0, -4(%ebp)  
jne L2  
movl    $LC0, (%esp)  
call    _printf  
L2: 

It should be pretty self-evident as to how this correlates with the original C code.

Paul R
what is the preamble and postamble? why does the compiler need to put it in the first place?
puffadder
Because of the overhead associated with the function main()
Mawg
@puffadder: the preamble/postamble is just generic boiler-plate code for setting up the stack and various registers. Remember that main typically takes two parameters and returns a function result.
Paul R
+2  A: 

The first part is some initialization code, which does not make any sense in the case of your simple example. This code would be removed with an optimization flag.

The last part can be mapped to C code:

movl    $0, -4(%ebp)    // put 0 into variable i (located at -4(%ebp))
cmpl    $0, -4(%ebp)    // compare variable i with value 0
jne L2                  // if they are not equal, skip to after the printf call
movl    $LC0, (%esp)    // put the address of "testing\n" at the top of the stack
call    _printf         // do call printf
L2:  
movl    $0, %eax        // return 0 (calling convention: %eax has the return code)
Jerome
+2  A: 

Well, much of it is the overhead associated with the function. main() is just a function like any other, so it has to store the return address on the stack at the start, set up the return value at the end, etc.

I would recommend using GCC to generate mixed source code and assembler which will show you the assembler generated for each sourc eline.

If you want to see the C code together with the assembly it was converted to, use a command line like this:

gcc -c -g -Wa,-a,-ad [other GCC options] foo.c > foo.lst

See http://www.delorie.com/djgpp/v2faq/faq8_20.html

On linux, just use gcc. On Windows down load Cygwin http://www.cygwin.com/


Edit - see also this question http://stackoverflow.com/questions/1289881/using-gcc-to-produce-readable-assembly

and http://oprofile.sourceforge.net/doc/opannotate.html

Mawg
+1  A: 

See here more information. You can generate the assembly code with C comments for better understanding.

gcc -g -Wa,-adhls your_c_file.c > you_asm_file.s

This should help you a little.

Iulian Şerbănoiu
+3  A: 

Since main is special you can often get better results by doing this type of thing in another function (preferably in it's own file with no main). For example:

void foo(int x) {
    if (x == 0) {
       printf("testing\n");
    }
}

would probably be much more clear as assembly. Doing this would also allow you to compile with optimizations and still observe the conditional behavior. If you were to compile your original program with any optimization level above 0 it would probably do away with the comparison since the compiler could go ahead and calculate the result of that. With this code part of the comparison is hidden from the compiler (in the parameter x) so the compiler can't do this optimization.

What the extra stuff actually is

_main:  
pushl   %ebpz  
movl    %esp, %ebp  
subl    $24, %esp  
andl    $-16, %esp

This is setting up a stack frame for the current function. In x86 a stack frame is the area between the stack pointer's value (SP, ESP, or RSP for 16, 32, or 64 bit) and the base pointer's value (BP, EBP, or RBP). This is supposedly where local variables live, but not really, and explicit stack frames are optional in most cases. The use of alloca and/or variable length arrays would require their use, though.

This particular stack frame construction is different than for non-main functions because it also makes sure that the stack is 16 byte aligned. The subtraction from ESP increases the stack size by more than enough to hold local variables and the andl effectively subtracts from 0 to 15 from it, making it 16 byte aligned. This alignment seems excessive except that it would force the stack to also start out cache aligned as well as word aligned.

movl    $0, %eax  
addl    $15, %eax  
addl    $15, %eax  
shrl    $4, %eax  
sall    $4, %eax  
movl    %eax, -8(%ebp)  
movl    -8(%ebp), %eax  
call    __alloca  
call    ___main 

I don't know what all this does. alloca increases the stack frame size by altering the value of the stack pointer.

movl    $0, -4(%ebp)  
cmpl    $0, -4(%ebp)  
jne L2  
movl    $LC0, (%esp)  
call    _printf  
L2:  
movl    $0, %eax  

I think you know what this does. If not, the movl just befrore the call is moving the address of your string into the top location of the stack so that it may be retrived by printf. It must be passed on the stack so that printf can use it's address to infer the addresses of printf's other arguments (if any, which there aren't in this case).

leave  

This instruction removes the stack frame talked about earlier. It is essentially movl %ebp, %esp followed by popl %ebp. There is also an enter instruction which can be used to construct stack frames, but gcc didn't use it. When stack frames aren't explicitly used, EBP may be used as a general puropose register and instead of leave the compiler would just add the stack frame size to the stack pointer, which would decrease the stack size by the frame size.

ret

I don't need to explain this.

When you compile with optimizations

I'm sure you will recompile all fo this with different optimization levels, so I will point out something that may happen that you will probably find odd. I have observed gcc replacing printf and fprintf with puts and fputs, respectively, when the format string did not contain any % and there were no additional parameters passed. This is because (for many reasons) it is much cheaper to call puts and fputs and in the end you still get what you wanted printed.

nategoose
Roger that. Exactly what I wanted to know !!!
puffadder
The call to `__alloca` is just doing the initialisation for the `alloca()` subsystem.
caf
@caf: Could you give any more info as to what that means?
nategoose