tags:

views:

98

answers:

3

Hello everyone,

I decided it would be fun to learn x86 assembly during the summer break. So I started with a very simple hello world program, borrowing on free examples gcc -S could give me. I ended up with this:

HELLO:
    .ascii "Hello, world!\12\0"
    .text

.globl _main
_main:
    pushl   %ebp        # 1. puts the base stack address on the stack
    movl    %esp, %ebp  # 2. puts the base stack address in the stack address register
    subl    $20, %esp   # 3. ???
    pushl   $HELLO      # 4. push HELLO's address on the stack
    call    _puts       # 5. call puts
    xorl    %eax, %eax  # 6. zero %eax, probably not necessary since we didn't do anything with it
    leave               # 7. clean up
    ret                 # 8. return
                        # PROFIT!

It compiles and even works! And I think I understand most of it.

Though, magic happens at step 3. Would I remove this line, my program would die between the call to puts and the xor from a misaligned stack error. And would I change $20 to another value, it'd crash too. So I came to the conclusion that this value is very important.

Problem is, I don't know what it does and why it's needed.

Can anyone explain me? (I'm on Mac OS, would it ever matter.)

+3  A: 

The general form of the comment should be "Allocates space for local variables". Why changing it arbitrarily would crash it I'm not sure. I can only see it crashing if you reduce it. And the proper comment for 6 is "Prepare to return a 0 from this function".

Ignacio Vazquez-Abrams
So return values are passed in `%eax`? I thought it would always go on the stack since they can potentially be bigger than 32 bits. And also, why do I need to allocate 24 bytes if I'm only going to use 4 of them? (__EDIT__ it works with 4, too. So I guess the stack has to be aligned on a certain boundary.)
zneak
Sounds like it's crashing from an alignment issue then, not a stack overflow. Values are returned in edx:eax, eax or a slice thereof, or a FPU register.
Ignacio Vazquez-Abrams
I'm pretty sure that the stack pointer has to be aligned on multiples of a DWORD (4 bytes) on x86 because it's 32-bit.
erjiang
@mazin k.: It actually has to be aligned on multiples of 16 bytes, as academicRobot mentioned.
zneak
It wasn't crashing from a stack overflow (that would be pretty awful considering it's a non-obfuscated hello world program). I mentioned in the question that it was a stack misalignment issue; assembler is still full of mysteries and problems you don't have with higher-level languages, so I didn't quite catch the problem at first.
zneak
+2  A: 

On x86 OSX, the stack needs to be 16 byte aligned for function calls, see ABI doc here. So, the explanation is

push stack pointer (#1)         -4
strange increment (#3)         -20
push argument (#4)              -4
call pushes return address (#5) -4
total                          -32

To check, change line #3 from $20 to $4, which also works.

Also, Ignacio Vazquez-Abrams points out, #6 is not optional. Registers contain remnants of previous calculations so it has to explicitly be zeroed.

I recently learned (still learning) assembly, too. To save you the shock, 64bit calling conventions are MUCH different (parameters passed on the register). Found this very helpful for 64bit assembly.

academicRobot
+1  A: 

Note that if you compile with -fomit-frame-pointer some of that %ebp pointer boilerplate will disappear. The base pointer is helpful for debugging but isn't actually necessary on x86.

Also I highly recommend using Intel syntax, which is supported by all the GCC/binutils stuff. I used to think that the difference between AT&T and Intel syntax was just a matter of taste, but then one day I came across this example where the AT&T mnemonic is just totally different from the Intel one. And since all the official x86 documentation uses Intel syntax, it seems like a better way to go.

Have fun!

Josh Haberman
Thank you very much.
zneak