views:

735

answers:

3

I never thought I'd be posting an assembly question. :-)

In GCC, there is an extended version of the asm function. This function can take four parameters: assembly-code, output-list, input-list and overwrite-list.

My question is, are the registers in the overwrite-list zeroed out? What happens to the values that were previously in there (from other code executing).

Update: In considering my answers thus far (thank you!), I want to add that though a register is listed in the clobber-list, it (in my instance) is being used in a pop (popl) command. There is no other reference.

+4  A: 

I suspect the overwrite list is just to give GCC a hint not to store anything of value in these registers across the ASM call; since GCC doesn't analyze what ASM you're giving it, and certain instructions have side-effects that touch other registers not explicitly named in the code, this is the way to tell GCC about it.

Paul Betts
Thank you for your answer. This is also a valuable information to this question.
Frank V
+4  A: 

If by "zeroed out" you mean "the values in the registers are replaced with 0's to prevent me from knowing what some other function was doing" then no, the registers are not zeroed out before use. But it shouldn't matter because you're telling GCC you plan to store information there, not that you want to read information that's currently there.

You give this information to GCC so that (reading the documentation) "you need not guess which registers or memory locations will contain the data you want to use" when you're finished with the assembly code (eg., you don't have to remember if the data will be in the stack register, or some other register).

GCC needs a lot of help for assembly code because "The compiler ... does not parse the assembler instruction template and does not know what it means or even whether it is valid assembler input. The extended asm feature is most often used for machine instructions the compiler itself does not know exist."

Update

GCC is designed as a multi-pass compiler. Many of the passes are in fact entirely different programs. A set of programs forming "the compiler" translate your source from C, C++, Ada, Java, etc. into assembly code. Then a separate program (gas, for GNU Assembler) takes that assembly code and turns it into a binary (and then ld and collect2 do more things to the binary). Assembly blocks exist to pass text directly to gas, and the clobber-list (and input list) exist so that the compiler can do whatever set up is needed to pass information between the C, C++, Ada, Java, etc. side of things and the gas side of things, and to guarantee that any important information currently in registers can be protected from the assembly block by copying it to memory before the assembly block runs (and copying back from memory afterward).

The alternative would be to save and restore every register for every assembly code block. On a RISC machine with a large number of registers that could get expensive (the Itanium has 128 general registers, another 128 floating point registers and 64 1-bit registers, for instance).

It's been a while since I've written any assembly code. And I have much more experience using GCC's named registers feature than doing things with specific registers. So, looking at an example:

#include <stdio.h>

long foo(long l)
{
    long result;
    asm (
        "movl %[l], %[reg];"
        "incl %[reg];"
        : [reg] "=r" (result)
        : [l] "r" (l)
    );
    return result;
}

int main(int argc, char** argv)
{
    printf("%ld\n", foo(5L));
}

I have asked for an output register, which I will call reg inside the assembly code, and that GCC will automatically copy to the result variable on completion. There is no need to give this variable different names in C code vs assembly code; I only did it to show that it is possible. Whichever physical register GCC decides to use -- whether it's %%eax, %%ebx, %%ecx, etc. -- GCC will take care of copying any important data from that register into memory when I enter the assembly block so that I have full use of that register until the end of the assembly block.

I have also asked for an input register, which I will call l both in C and in assembly. GCC promises that whatever physical register it decides to give me will have the value currently in the C variable l when I enter the assembly block. GCC will also do any needed recordkeeping to protect any data that happens to be in that register before I enter the assembly block.

What if I add a line to the assembly code? Say:

"addl %[reg], %%ecx;"

Since the compiler part of GCC doesn't check the assembly code it won't have protected the data in %%ecx. If I'm lucky, %%ecx may happen to be one of the registers GCC decided to use for %[reg] or %[l]. If I'm not lucky, I will have "mysteriously" changed a value in some other part of my program.

Max Lybbert
Thank you for your answer. Your answer (as well as the others) have given me the information I need... I updated my question and am hoping that you'd consider my new information in regards to your answer... Thanks again.
Frank V
Thank you for the detailed information. I understand and more importantly, it allows me to address and understand my program's issue. Again, thank you.
Frank V
+7  A: 

No, they are not zeroed out. The purpose of the overwrite list (more commonly called the clobber list) is to inform GCC that, as a result of the asm instructions the register(s) listed in the clobber list will be modified, and so the compiler should preserve any which are currently live.

For example, on x86 the cpuid instruction returns information in four parts using four fixed registers: %eax, %ebx, %ecx and %edx, based on the input value of %eax. If we were only interested in the result in %eax and %ebx, then we might (naively) write:

int input_res1 = 0; // also used for first part of result 
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2) );

This would get the first and second parts of the result in C variables input_res1 and res2; however if GCC was using %ecx and %edx to hold other data; they would be overwritten by the cpuid instruction without gcc knowing. To prevent this; we use the clobber list:

int input_res1 = 0; // also used for first part of result 
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2)
                : : "%ecx", "%edx" );

As we have told GCC that %ecx and %edx will be overwritten by this asm call, it can handle the situation correctly - either by not using %ecx or %edx, or by saving their values to the stack before the asm function and restoring after.

Update:

With regards to your second question (why you are seeing a register listed in the clobber list for a popl instruction) - assuming your asm looks something like:

__asm__("popl %eax" : : : "%eax" );

Then the code here is popping an item off the stack, however it doesn't care about the actual value - it's probably just keeping the stack balanced, or the value isn't needed in this code path. By writing this way, as opposed to:

int trash // don't ever use this.
__asm__("popl %0" : "=r"(trash));

You don't have to explicitly create a temporary variable to hold the unwanted value. Admittedly in this case there isn't a huge difference between the two, but the version with the clobber makes it clear that you don't care about the value from the stack.

Dave Rigby
Both your and Max's answers answer my question. But, I can only pick one answer as my final answer and because Max (somehow) came closer to my actual problem, I'm going to give him the green check mark. But, I would have split the credit between you two if possible. Your answer also helped me and I want to thank you for taking the time to address my question.
Frank V