views:

164

answers:

4

This question is mostly academic. I ask out of curiosity, not because this poses an actual problem for me.

Consider the following incorrect C program.

#include <signal.h>
#include <stdio.h>

static int running = 1;

void handler(int u) {
    running = 0;
}

int main() {
    signal(SIGTERM, handler);
    while (running)
        ;
    printf("Bye!\n");
    return 0;
}

This program is incorrect because the handler interrupts the program flow, so running can be modified at any time and should therefore be declared volatile. But let's say the programmer forgot that.

gcc 4.3.3, with the -O3 flag, compiles the loop body (after one initial check of the running flag) down to the infinite loop

.L7:
        jmp     .L7

which was to be expected.

Now we put something trivial inside the while loop, like:

    while (running)
        putchar('.');

And suddenly, gcc does not optimize the loop condition anymore! The loop body's assembly now looks like this (again at -O3):

.L7:
        movq    stdout(%rip), %rsi
        movl    $46, %edi
        call    _IO_putc
        movl    running(%rip), %eax
        testl   %eax, %eax
        jne     .L7

We see that running is re-loaded from memory each time through the loop; it is not even cached in a register. Apparently gcc now thinks that the value of running could have changed.

So why does gcc suddenly decide that it needs to re-check the value of running in this case?

+3  A: 

Because the call to putchar() could change the value of running (GCC only knows that putchar() is an external function and does not know what it does - for all GCC knows putchar() could call handler()).

R Samuel Klatchko
Excellent point about `handler` being non-static. So even though `putchar` is defined in another translation unit, it could still indirectly change `running` that way.
Johannes Schaub - litb
+3  A: 

GCC probably assumes that the call to putchar can modify any global variable, including running.

Take a look at the pure function attribute, which states that the function does not have side-effects on the global state. I suspect if you replace putchar() with a call to a "pure" function, GCC will reintroduce the loop optimization.

Mike Mueller
+5  A: 

In the general case it's difficult for a compiler to know exactly which objects a function might have access to and therefore could potentially modify. At the point where putchar() is called, GCC doesn't know if there might be a putchar() implementation that might be able to modify running so it has to be somewhat pessimistic and assume that running might in fact have been changed.

For example, there might be a putchar() implementation later in the translation unit:

int putchar( int c)
{
    running = c;
    return c;
}

Even if there's not a putchar() implementation in the translation unit, there could be something that might, for example, pass the address of the running object such that putchar might be able to modify it:

void foo(void)
{
    set_putchar_status_location( &running);
}

Note that your handler() function is globally accessible, so putchar() might call handler() itself (directly or otherwise), which is an instance of the above situation.

On the other hand, since running is visible only to the translational unit (being static), by the time the compiler gets to the end of the file it should be able to determine that there is no opportunity for putchar() to access it (assuming that's the case), and the compiler could go back and 'fix up' the pessimization in the while loop.

Since running is static, the compiler might be able to determine that it's not accessible from outside the translation unit and make the optimization you're talking about. However, since it's accessible through handler() and handler() is accessible externally, the compiler can't optimize the access away. Even if you make handler() static, it's accessible externally since you pass the address of it to another function.

Note that in your first example, even though what I mentioned in the above paragraph is still true the compiler can optimize away the access to running because the 'abstract machine model' the C language is based on doesn't take into account asynchronous activity except in very limited circumstances (one of which is the volatile keyword and another is signal handling, though the requirements of the signal handling aren't strong enough to prevent the compiler being able to optimize away the access to running in your first example).

In fact, here's something the C99 says about the abstract machine behavior in pretty much these exact circumstances:

5.1.2.3/8 "Program execution"

EXAMPLE 1:

An implementation might define a one-to-one correspondence between abstract and actual semantics: at every sequence point, the values of the actual objects would agree with those specified by the abstract semantics. The keyword volatile would then be redundant.

Alternatively, an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. In such an implementation, at the time of each function entry and function return where the calling function and the called function are in different translation units, the values of all externally linked objects and of all objects accessible via pointers therein would agree with the abstract semantics. Furthermore, at the time of each such function entry the values of the parameters of the called function and of all objects accessible via pointers therein would agree with the abstract semantics. In this type of implementation, objects referred to by interrupt service routines activated by the signal function would require explicit specification of volatile storage, as well as other implementation defined restrictions.

Finally, you should note that the C99 standard also says:

7.14.1.1/5 "The signal function`

If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if the signal handler refers to any object with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t...

So strictly speaking the running variable may need to be declared as:

volatile sig_atomic_t running = 1;
Michael Burr
Is your last paragraph true given that handler() is non-static (and that it's address is passed to the signal() call)? I'd have though those would prevent any optimization based solely on the contents of this c file (this lines up with what litb describes in his comment to the OP)
Mike Dinsdale
@Mike Dinsdale: you're correct. I tried to cover that by saying "if that's the case", which isn't the case here (as mentioned in the previous paragraph). But on second reading that's not very clear. Let me see if I can clean up that paragraph a bit.
Michael Burr
A: 

Thank you all for your answers and comments. They have been very helpful, but none of them provide the full story. [Edit: Michael Burr's answer now does, making this somewhat redundant.] I'll sum up here.

Even though running is static, handler is not static; therefore it might be called from putchar and change running in that way. Since the implementation of putchar is not known at this point, it could conceivably call handler from the body of the while loop.

Suppose handler were static. Can we optimize away the running check then? The answer is no, because the signal implementation is also outside this compilation unit. For all gcc knows, signal might store the address of handle somewhere (which, in fact, it does), and putchar might then call handler through this pointer even though it has no direct access to that function.

So in what cases can the running check be optimized away? It seems that this is only possible if the loop body does not call any functions from outside this translation unit, so that it is known at compilation time what does and does not happen inside the loop body.

This explains why forgetting a volatile is not such a big deal in practice as it might seem at first.

Thomas