views:

206

answers:

5

Is there any guarantee by any commonly followed standard (ISO C or C++, or any of the POSIX/SUS specifications) that a variable (perhaps marked volatile), not guarded by a mutex, that is being accessed by multiple threads will become eventually consistent if it is assigned to?

To provide a specific example, consider two threads sharing a variable v, with initial value zero.

Thread 1: v = 1

Thread 2: while(v == 0) yield();

Is thread 2 guaranteed to terminate eventually? Or can it conceivably spin forever because the cache coherency never kicks in and makes the assignment visible in thread 2's cache?

I'm aware the C and C++ standards (before C++0x) do not speak at all about threads or concurrency. But I'm curious if the C++0x memory model, or pthreads, or anything else, guarantees this. (Apparently this does actually work on Windows on 32-bit x86; I'm wondering if it's something that can be relied on generally or if it just happens to work there).

+4  A: 

First off, if it's not marked volatile there is a good chance the compiler may only load it once. So regardless of whether the memory eventually changes, there is no guarantee the compile will set it.

Since you explicitly say "no mutexes", pthreads doesn't apply.

Beyond that, since C++ does not have a memory model, it depends on the hardware architecture.

R Samuel Klatchko
Not in that example if yield() is a function the compiler cannot see the body of, and the variable is not local to the compilation unit, since the compiler has to assume that the yield() function could change the value of v. @moonshadow's answer still applies, of course.
CesarB
+5  A: 

It's going to depend on your architecture. While it is unusual to require an explicit cache flush or memory sync to ensure memory writes are visible to other threads, nothing precludes it, and I've certainly encountered platforms (including the PowerPC-based device I am currently developing for) where explicit instructions have to be executed to ensure state is flushed.

Note that thread synchronisation primitives like mutexes will perform the necessary work as required, but you don't typically actually need a thread synchronisation primitive if all you want is to ensure the state is visible without caring about consistency - just the sync / flush instruction will suffice.

EDIT: To anyone still in confustion about the volatile keyword - volatile guarantees the compiler will not generate code that explicitly caches data in registers, but this is NOT the same thing as dealing with hardware that transparently caches / reorders reads and writes. Read e.g. this or this, or this Dr Dobbs article, or the answer to this SO question, or just pick your favourite compiler that targets a weakly consistent memory architecture like Cell, write some test code and compare what the compiler generates to what you'd need in order to ensure writes are visible to other processes.

moonshadow
The compiler must do what it fits suite to guarantee that all writes to `volatile` variables actually hit main memory, and that will in turn make it visible by other threads.
David Rodríguez - dribeas
@David: this is mistaken. "Accesses to `volatile` objects must be evaluated strictly according to the abstract machine defined by the language standard." This is a statement about what kind of optimisations C++ may perform, not what extra processing the programmer may wish to do to deal with architectural quirks. It is saying the compiler must generate an explicit write instruction for each assignment in the source, but it does not say anything about generating `flush` or `sync` or `eieio` or whatever your CPU may need to actually cause the data to hit memory in program order or at all.
moonshadow
There are more statements about `volatile`. The critical one is that their reads and writes are observable side-effects. In particular, the loop from the question **must** read v repeatedly. It may not cache the value. Not in a register, not in L1 cache, not anywehere else.
MSalters
@MSalters: your conclusion, again, is incorrect. Reads and writes to volatiles are observable side effects: again, this is a statement about the kind of optimisations a compiler may not perform, not a statement about additional code it must generate. The compiler may not generate code that caches the volatile data, but the hardware caching data it was told to store is not the compiler's responsibility.
moonshadow
True that you don't usually need to do anything to make the state visible to other threads. But usually you *do* care about consistency (usually you're performing this synchronization in order to ensure that certain code is only executed at a certain time, in a certain state), and then the OP's code can no longer be relied on.
jalf
@moonshadow: you're making an artificial distinction between code generation and optimisations. The compiler must generate correct code, and it doesn't matter what algorithm it uses for that. The same thing applies to your hardware caching notion. A conforming C++ implementation will emit instructions to tell the hardware not to do that. Failing to include any necessary instruction makes an implementation nonconforming; cache directives are no exception.
MSalters
@MSalters: you are exaggerating the compiler's responsibility. It's up to the programmer, not the compiler, to select the most appropriate way of dealing with their hardware's concurrency issues. `volatile` is a way of telling the compiler not to reorder writes so you're not fighting the compiler as well. There is an excellent reason for this separation of responsibility: architectures like the Cell contain multiple synchronisation / fence / barrier instructions with vastly different costs and effects, and the compiler has no way of knowing which is the most appropriate to a given situation.
moonshadow
@MSalters: have a read of [PowerPC Architecture book II](http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html) sections 3.2-3.3, and consider what a compiler that conforms to your interpretation of the C++ spec is supposed to emit. The compiler simply does not have the information for a sensible decision (has the memory page been configured with write-combining? Are you writing to memory shared between threads on the same PPU core, or to some device that is affected by `eieio` but not `lwsync`?), which is why the spec does not require it and real-world compilers do not do it.
moonshadow
@moonshadow: I could care, and a Cell compiler should provide an extension for that. But if I just say `volatile`, I want the compiler to do what the standard tells it to (generate observable reads and writes) without bothering me with details. Should it put volatile variables on page that is `eieo` but not `lwsync`? I don't care the slightest.
MSalters
A: 

I think it will work eventually on any platform, but no idea on the delay that you may see.

But honestly, it is really bad style to do a polling wait for an event. Even though you do a yield your process will be rescheduled again and again, without doing anything.

Since you already know how to place a variable somewhere where it is accessible to both, why not use the right tools to do a wait that doesn't eat up resources? A pair of pthread_mutex_t and pthread_cond_t should perfectly do the trick.

Jens Gustedt
+1  A: 

If I've understood correctly the relevant sections, C++0X won't guaranteed it for standalone variable or even volatile one (volatile isn't designed for that use), but will introduce atomic types for which you'll have the guarantee (see header <atomic>).

AProgrammer
A: 

Is thread 2 guaranteed to terminate eventually? Or can it conceivably spin forever because the cache coherency never kicks in and makes the assignment visible in thread 2's cache?

If the variable is not volatile, you have no guarantees. Pre-C++0x, the standard just has nothing to say about threads, and since the variable is not volatile, reads/writes are not considered observable side effects, so the compiler is allowed to cheat. Post-C++0x, it's a race condition, which is explicitly stated to be undefined behavior.

If the variable is volatile, you get the guarantee that reads/writes will happen, and that they won't be reordered with respect to other volatile memory accesses.

But you have no guarantee that it won't be reordered with respect to other non-volatile accesses, so you might not get the behavior you expected. In particular, some of the instructions after the while loop, which you're trying to "protect" may be moved up ahead of the loop if the compiler deems it safe (and beneficial) to do so. But in performing this analysis, it only looks at the current thread, not what happens in other threads.

So no, in general, it is not guaranteed to work correctly, even with volatile. It might, and it probably often will, but not always (and it depends on what happens after the loop). It depends on how far the compiler is willing to go with optimizations. But it is allowed to go far enough to break the code. So don't rely on it. If you want to synchronize around something like this, use memory barriers. That's what they're for. (And if you do that, you don't even need the volatile any more)

jalf