tags:

views:

593

answers:

3

GCC compiles (using gcc --omit-frame-pointer -s):

    int the_answer() { return 42; }

into

            .Text
    .globl _the_answer
    _the_answer:
        subl $12, %esp
        movl $42, %eax
        addl $12, %esp
        ret
       .subsections_via_symbols

What is the '$12' constant doing here, and what is the '%esp' register?

+6  A: 

Short answer: stack frames.

Long answer: when you call a function, compilers will manipulate the stack pointer to allow for local data such as function variables. Since your code is changing esp, the stack pointer, that's what I assume is happening here. I would have thought GCC smart enough to optimize this away where it's not actually required, but you may not be using optimization.

paxdiablo
+1  A: 

Using GCC 4.3.2 I get this for the function:

the_answer:
movl $42, %eax
ret

...plus surrounding junk, by using the following command line: echo 'int the_answer() { return 42; }' | gcc --omit-frame-pointer -S -x c -o - -

Which version are you using?

Ant P.
4.0.1 (Apple Inc. build 5488).Guess it's a bug.
Mike Douglas
@Mike, not a bug. The code works fine since the subl is reversed by the addl. It's inefficient but definitely not a bug.
paxdiablo
Might not be a bug, it could just be that 4.3 is smarter at figuring out which instructions are safe to remove.
Ant P.
Disappears with '-O3'. Maybe Apple's GCC has a lower default optimization level?
Mike Douglas
Depends on whether Apple developers thought it is worth it. It's not much of a gcc issue itself though. The backend writers need to think about that
Johannes Schaub - litb
Didn't Apple switch over to clang?
JUST MY correct OPINION
+1  A: 
_the_answer:
    subl    $12, %esp
    movl    $42, %eax
    addl    $12, %esp
    ret

The first subl decrements the stack-pointer, to make room for variables that may be used in your function. One slot may be used for the frame pointer, another to hold the return address, for example. You said it should omit the frame pointer. That usually means that it omits loads/stores to save/restore the frame pointer. But often the code will still reserve memory for it. The reason is that it makes code that analyzes the stack much easier. It's easy to give the offset of the stack a minimal width and so you know you can always access FP+0x12, to get at the first local variable slot, even if you omit saving the frame pointer.

Well, eax on x86 is used to handle the return value to the caller, as far as i know. And the last addl just destroys the previously created frame for your function.

The code that generates the instructions at the start and end of functions is called "epilogue" and "prologue" of the function. Here is what my port does when it has to create the prologue of a function in GCC (it's way more complex for real-world ports that intend to be as fast and versatile as possible):

void eco32_prologue(void) {
    int i, j;
    /* reserve space for all callee saved registers, and 2 additional ones:
     * for the frame pointer and return address */
    int regs_saved = registers_to_be_saved() + 2;
    int stackptr_off = (regs_saved * 4 + get_frame_size());

    /* decrement the stack pointer */
    emit_move_insn(stack_pointer_rtx, 
                   gen_rtx_MINUS(SImode, stack_pointer_rtx, 
                                 GEN_INT(stackptr_off)));

    /* save return adress, if we need to */
    if(eco32_ra_ever_killed()) {
        /* note: reg 31 is return address register */
        emit_move_insn(gen_rtx_MEM(SImode, 
                           plus_constant(stack_pointer_rtx, 
                                         -4 + stackptr_off)),  
                       gen_rtx_REG(SImode, 31));
    }

    /* save the frame pointer, if it is needed */
    if(frame_pointer_needed) {
        emit_move_insn(gen_rtx_MEM(SImode, 
                           plus_constant(stack_pointer_rtx, 
                                         -8 + stackptr_off)), 
                       hard_frame_pointer_rtx);
    }

    /* save callee save registers */
    for(i=0, j=3; i<FIRST_PSEUDO_REGISTER; i++) {
        /* if we ever use the register, and if it's not used in calls
         * (would be saved already) and it's not a special register */
        if(df_regs_ever_live_p(i) && 
           !call_used_regs[i] && !fixed_regs[i]) {
            emit_move_insn(gen_rtx_MEM(SImode, 
                               plus_constant(stack_pointer_rtx, 
                                             -4 * j + stackptr_off)), 
                           gen_rtx_REG(SImode, i));
            j++;
        }
    }

    /* set the new frame pointer, if it is needed now */
    if(frame_pointer_needed) {
        emit_move_insn(hard_frame_pointer_rtx, 
                       plus_constant(stack_pointer_rtx, stackptr_off));
    }
}

I omitted some code that deals with other issues, primarily with telling GCC what are instructions important for exception handling (i.e where the frame pointer is stored and so on). Well, callee saved registers are the ones that the caller don't need to save prior to a call. The called function cares about saving/restoring them as needed. As you see in the first lines, we always allocate space for the return address and frame pointer. That space is just a few bytes and won't matter. But we only generate the stores/loads when necessary. Finally note the "hard" frame pointer is the "real" frame pointer register. It's necessary because of some gcc internal reasons. The "frame_pointer_needed" flag is set by GCC, whenever i can not omit storing the frame-pointer. For some cases, it has to be stored, for example when alloca (it changes the stackpointer dynamically) is used. GCC cares about all that. Note it has been some time since i wrote that code, so i hope the additional comments i added above are not all wrong :)

Johannes Schaub - litb