views:

162

answers:

6

In the following code:

#include <pthread.h>
#include <unistd.h>
#include <stdio.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int ready = 0;

wait()
{
    int i;
    do
    {
        usleep(1000);
        pthead_mutex_lock(&mutex);
        i = ready;
        pthread_mutex_unlock(&mutex);
    } while (i == 0);   
    printf("Finished\n");
}

signal()
{
    pthead_mutex_lock(&mutex);
    ready = 1;
    pthread_mutex_unlock(&mutex);
}

We spawn two threads we call wait in one thread and then call signal in the other We also get the compiler to optimize aggressively.

Now will the code behave as expected or will we need to make ready volatile to get this to work? Will different compilers and libraries handle this differently?

Edit: I am hoping that there might be something round the mutex functions that will prevent optimization around itself or that the compiler generally does not optimize round function calls.

Note: I have not compiled and tested the code yet, will do so when I have a chance.

A: 

yes, volatile int ready = 0; is always needed here.

Update
If you would like no optimization around some code snippet, you can either use #pragma directives around that code (if your compiler supports them) or - which is totally portable - move the code to separate files and compile these files with no or little optimizations.
The latter approach might still require usage of the volatile keyword, since the ready variable might be used by other modules which you might want to optimize.

ULysses
Please see my edit.
doron
+3  A: 

I'd be surprised if the compiler assumes anything about a global variable in the presence of library function calls. That being said, volatile will not cost you anything, and it shows your intentions. I'd put it there.

Arkadiy
So, you're assuming the optimizer doesn't look into the code for those library functions? That's probably true today, but may not be next year.
Darron
@Darron: The compiler can't see code it loads from a DLL.
DeadMG
Not all libraries are DLL's. And not all optimizers run at compile time; there have been experiments with Just In Time optimizers for machine code.
Darron
Any compiler that's smart enough to look into libraries should be smart enough to notice kernel calls there and know the implications.
Arkadiy
volatile does have a cost. It prevents optimizations in other places where it is valid.
doron
The kind of optimization that volatile prevents is the kind of optimization the compiler should not be doing in this context anyway.
Arkadiy
Why not? It's perfectly legal unless you tell it not to by saying "volatile".
Darron
My point is that you could access half a dozen variable in the mutex region -- which of them should not be optimized?
Darron
This article examines the constraints on compiler reordering around pthreads functions: http://www.hpl.hp.com/techreports/2005/HPL-2005-217R1.html
caf
A: 

Volatile is needed here in the presence of optimization. Otherwise the read of ready can be legally moved out of the while loop.

Assuming limitations to optimization that are not promised by the standard may be fine now, but cause great grief to future maintainers as compilers improve.

Darron
+1  A: 

You probably don't need volatile but specifying it won't hurt you.

The paper 'Threads Cannot Be Implemented As a Library' by Hans Boehm says that 'in practice' implementations of pthreads force the compiler to treat calls to pthread functions as if they may modify any global variable at all and consequently reads and writes cannot be reordered. The 'in practice' part might make you want to check with your particular implementation.

Cole
A: 

Now will the code behave as expected or will we need to make ready volatile to get this to work?

I would advise to use the volatile in the situation. Though in the case it seems it is not needed.

IOW, Personally I would have added volatile and removed the locking: it is not needed for setting/reading the variable's value.

Will different compilers and libraries handle this differently? I am hoping that there might be something round the mutex functions that will prevent optimization around itself or that the compiler generally does not optimize round function calls.

In your case the call to the function (pthread_mutex_lock()) has side effects and changes the execution environment. Thus compiler has to reread the global variable which might have being change by the call to the function.

To be really sure, you want to consult with C99's 5.1.2.3 Program execution where from I have borrowed the terminology. To give you the taste:

[...] Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)

In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object). [...]

Excerpt from the annex C:

The following are the sequence points described in 5.1.2.3:
— The call to a function, after the arguments have been evaluated (6.5.2.2).

And from there on.

In my experience, compilers are sufficiently smart nowadays and even when optimizing aggressively wouldn't do anything fancy with loop control variable which is a global variable.

Dummy00001
The locking is needed even if you add the volatile keyword. This is for two reasons. Updating the integer may not be atomic. And Even if the write is atomic, on a multicore machine with separate caches, there is no guarantee that the reads and writes will happen in order. The only way to guarantee that they happen in order is to insert a memory barrier instruction. The lock will ensure that this definitely happens.
doron
@deus-ex-machina399: About atomicity. pretty much all archs have int smaller than or equal to the cpu word. that makes the read/write operations atomic. (increment/decrement/etc - are not atomic because they combine multiple operations: read, change value, write.) About ordering. In the case of the simple read/write code as you have above, cache coherency guarantees are sufficient. http://en.wikipedia.org/wiki/Cache_coherence
Dummy00001
You will have a problem if we write to a global variable (say myval) before calling signal and then the other thread reads the variable after wait exits. Without a memory barrier, the wait thread may get the readready before the global variable (myval) is set due to reordering of reads and writes. The data memory barrier will ensure that this does not happen. See http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
doron
@deus-ex-machina399: Have read the link. Nice article, bookmarked. But. But there is no memory ordering concerns in your test program above. Memory ordering you need if you e.g. update two variables and code depends on order of the update. Your code above doesn't: you have only one variable. (As an ex device driver developer I'm well aware about memory ordering concerns btw.)
Dummy00001
+2  A: 

Some perspective from the kernel kings:

http://kernel.org/doc/Documentation/volatile-considered-harmful.txt

Noah Watkins