A: 

Is this just expected/undefined behaviour? Have I done something wrong or have I in fact found a Compiler Bug™?

No bug just the defined behaviour that optimisation options can produce odd code which might not work :)

EDIT:

If you think you have found a bug in GCC the mailing lists will be glad you dropped by but generally they find some hole in your knowledge is to blame and mock mercilessly :(

In this case I think it's probably the -O options attempting shortcuts that break your code that need working around.

sparkes
+1  A: 

I'm darned if I can find a reference at the moment, but I'm 99% sure that you are always supposed to be able to take the address of an argument, and it's up to the compiler to finesse the details of calling conventions, register usage, etc.

Indeed, I would have thought it to be such a common requirement that it's hard to see there can be general problem in this - I wonder if it's something about the volatile pointers which have upset the optimisation.

Personally, I might do try this to see if it compiled better:

void foo(unsigned int x) 
{
    volatile unsigned int* pArg = &x;
    *(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)pArg;
}
Will Dean
sure you can take the address of it (only time you can't is if you used the register keyword). But this is writing the address of a local which disappears, so it is perfectly legal for the compiler to optimize it out.
Evan Teran
A: 

sparkes wrote

If you think you have found a bug in GCC the mailing lists will be glad you dropped by but generally they find some hole in your knowledge is to blame and mock mercilessly :(

I figured I'd try my luck here first before going to the GCC mailing list to show my incompetence :)


Adam Davis wrote

Out of curiosity, what are you trying to accomplish?

I was trying out development for the Game Boy Advance. I was reading about its DMA system and I experimented with it by creating single-color tile bitmaps. The idea was to have the indexed color be passed as an argument to a function which would use DMA to fill a tile with that color. The source address for the DMA transfer is stored at 0x40000d4.


Will Dean wrote

Personally, I might do try this to see if it compiled better:

void foo(unsigned int x) 
{
    volatile unsigned int* pArg = &x;
    *(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)pArg;
}

With -O0 that works as well and with -O1 that is optimized to the exact same -O1 assembly I've posted in my question.

Tomi Kyöstilä
+2  A: 

So you're putting the address of a local stack variable into the DMA controller to be used, and then you're returning from the function where the stack variable is available?

While this might work with your main() example (since you aren't writing on the stack again) it won't work in a 'real' program later - that value will be overwritten before or while DMA is accessing it when another function is called and the stack is used again.

You need to have a structure, or a global variable you can use to store this value while the DMA accesses it - otherwise it's just going to get clobbered!

Adam Davis
A: 

Not an answer, but just some more info for you. We are running 3.4.5 20051201 (Red Hat 3.4.5-2) at my day job.

We have also noticed some of our code (which I can't post here) stops working when we add the -O1 flag. Our solution was to remove the flag for now :(

Alan H
+2  A: 

I actually don't think the compiler is wrong, although this is an odd case.

From a code analysis point-of-view, it sees you storing the address of a variable, but that address is never dereferenced and you don't jump outside of the function to external code that could use that address you stored. When you exit the function, the address of the stack is now considered bogus, since its the address of a variable that no longer exists.

The "volatile" keyword really doesn't do much in C, especially with regards to multiple threads or hardware. It just tells the compiler that it has to do the access. However, since there's no users of the value of x according to the data flow, there's no reason to store the "1" on the stack.

It probably would work if you wrote

void foo(unsigned int x) {
    volatile int y = x;
    *(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)(&y);
}

although it still may be illegal code, since the address of y is considered invalid as soon as foo returns, but the nature of the DMA system would be to reference that location independently of the program flow.

Ben Combee
A: 

In general I would say, that it is a valid optimization. If you want to look deeper into it, you could compile with -da This generates a .c.Number.Passname, where you can have a look at the rtl (intermediate representation within the gcc). There you can see which pass makes which optimization (and maybe disable just the one, you dont want to have)

flolo
+1  A: 

Once you return from foo(), x is gone, and any pointers to it are invalid. Subsequently using such a pointer results in what the C standard likes to call "undefined behavior," which means the compiler is absolutely allowed to assume you won't dereference it, or (if you insist on doing it anyway) need not produce code that does anything remotely like what you might expect. If you want the pointer to x to remain valid after foo() returns, you must not allocate x on foo's stack, period -- even if you know that in principle, nothing has any reason to clobber it -- because that just isn't allowed in C, no matter how often it happens to do what you expect.

The simplest solution might be to make x a local variable in main() (or in whatever other function has a sufficiently long-lived scope) and to pass the address in to foo. You could also make x a global variable, or allocate it on the heap using malloc(), or set aside memory for it in some more exotic way. You can even try to figure out where the top of the stack is in some (hopefully) more portable way and explicitly store your data in some part of the stack, if you're sure you won't be needing for anything else and you're convinced that's what you really need to do. But the method you've been using to do that isn't sufficiently reliable, as you've discovered.

zaphod
+3  A: 

One thing to note is that according to the standard, casts are r-values. GCC used to allow it, but in recent versions has become a bit of a standards stickler.

I don't know if it will make a difference, but you should try this:

void foo(unsigned int x) {
    volatile unsigned int* ptr = (unsigned int*)(0x4000000 + 0xd4);
    *ptr = (unsigned int)(&x);
}

int main() {
    foo(1);
    while(1);
}

Also, I doubt you intended it, but you are storing the address of the function local x (which is a copy of the int you passed). You likely want to make foo take an "unsigned int *" and pass the address of what you really want to store.

So I feel a more proper solution would be this:

void foo(unsigned int *x) {
    volatile unsigned int* ptr = (unsigned int*)(0x4000000 + 0xd4);
    *ptr = (unsigned int)(x);
}

int main() {
    int x = 1;
    foo(&x);
    while(1);
}

EDIT: finally, if you code breaks with optimizations it is usually a sign that your code is doing something wrong.

Evan Teran
A: 

I think Even T. has the answer. You passed in a variable, you cannot take the address of that variable inside the function, you can take the address of a copy of that variable though, btw that variable is typically a register so it doesnt have an address. Once you leave that function its all gone, the calling function loses it. If you need the address in the function you have to pass by reference not pass by value, send the address. It looks to me that the bug is in your code, not gcc.

BTW, using *(volatile blah *)0xabcd or any other method to try to program registers is going to bite you eventually. gcc and most other compilers have this uncanny way of knowing exactly the worst time to strike.

Say the day you change from this

*(volatile unsigned int *)0x12345 = someuintvariable;

to

*(volatile unsigned int *)0x12345 = 0x12;

A good compiler will realize that you are only storing 8 bits and there is no reason to waste a 32 bit store for that, depending on the architecture you specified, or the default architecture for that compiler that day, so it is within its rights to optimize that to an strb instead of an str.

After having been burned by gcc and others with this dozens of times I have resorted to forcing the issue:

.globl PUT32
PUT32:
   str r1,[r0]
   bx lr





   PUT32(0x12345,0x12);

Costs a few extra clock cycles but my code continues to work yesterday, today, and will work tomorrow with any optimization flag. Not having to re-visit old code and sleeping peacefully through the night is worth a few extra clock cycles here and there.

Also if your code breaks when you compile for release instead of compile for debug, that also means it is most likely a bug in your code.

dwelch