views:

502

answers:

5

So I'm aware that nothing is atomic in C++. But I'm trying to figure out if there are any "pseudo-atomic" assumptions I can make. The reason is that I want to avoid using mutexes in some simple situations where I only need very weak guarantees.

1) Suppose I have globally defined volatile bool b, which initially I set true. Then I launch a thread which executes a loop

while(b) doSomething();

Meanwhile, in another thread, I execute b=true.

Can I assume that the first thread will continue to execute? In other words, if b starts out as true, and the first thread checks the value of b at the same time as the second thread assigns b=true, can I assume that the first thread will read the value of b as true? Or is it possible that at some intermediate point of the assignment b=true, the value of b might be read as false?

2) Now suppose that b is initially false. Then the first thread executes

bool b1=b;
bool b2=b;
if(b1 && !b2) bad();

while the second thread executes b=true. Can I assume that bad() never gets called?

3) What about an int or other builtin types: suppose I have volatile int i, which is initially (say) 7, and then I assign i=7. Can I assume that, at any time during this operation, from any thread, the value of i will be equal to 7?

4) I have volatile int i=7, and then I execute i++ from some thread, and all other threads only read the value of i. Can I assume that i never has any value, in any thread, except for either 7 or 8?

5) I have volatile int i, from one thread I execute i=7, and from another I execute i=8. Afterwards, is i guaranteed to be either 7 or 8 (or whatever two values I have chosen to assign)?

+10  A: 

There are no threads in standard C++, and Threads cannot be implemented as a library.

Therefore, the standard has nothing to say about the behaviour of programs which use threads. You must look to whatever additional guarantees are provided by your threading implementation.

That said, in threading implementations I've used:

(1) yes, you can assume that irrelevant values aren't written to variables. Otherwise the whole memory model goes out the window. But be careful that when you say "another thread" never sets b to false, that means anywhere, ever. If it does, that write could perhaps be re-ordered to occur during your loop.

(2) no, the compiler can re-order the assignments to b1 and b2, so it is possible for b1 to end up true and b2 false. In such a simple case I don't know why it would re-order, but in more complex cases there might be very good reasons.

[Edit: oops, by the time I got to answering (2) I'd forgotten that b was volatile. Reads from a volatile variable won't be re-ordered, sorry, so yes on a typical threading implementation (if there is any such thing), you can assume that you won't end up with b1 true and b2 false.]

(3) same as 1. volatile in general has nothing to do with threading at all. However, it is quite exciting in some implementations (Windows), and might in effect imply memory barriers.

(4) on an architecture where int writes are atomic yes, although volatile has nothing to do with it. See also...

(5) check the docs carefully. Likely yes, and again volatile is irrelevant, because on almost all architectures int writes are atomic. But if int write is not atomic, then no (and no for the previous question), even if it's volatile you could in principle get a different value. Given those values 7 and 8, though, we're talking a pretty weird architecture for the byte containing the relevant bits to be written in two stages, but with different values you could more plausibly get a partial write.

For a more plausible example, suppose that for some bizarre reason you have a 16 bit int on a platform where only 8bit writes are atomic. Odd, but legal, and since int must be at least 16 bits you can see how it could come about. Suppose further that your initial value is 255. Then increment could legally be implemented as:

  • read the old value
  • increment in a register
  • write the most significant byte of the result
  • write the least significant byte of the result.

A read-only thread which interrupted the incrementing thread between the third and fourth steps of that, could see the value 511. If the writes are in the other order, it could see 0.

An inconsistent value could be left behind permanently if one thread is writing 255, another thread is concurrently writing 256, and the writes get interleaved. Impossible on many architectures, but to know that this won't happen you need to know at least something about the architecture. Nothing in the C++ standard forbids it, because the C++ standard talks about execution being interrupted by a signal, but otherwise has no concept of execution being interrupted by another part of the program, and no concept of concurrent execution. That's why threads aren't just another library - adding threads fundamentally changes the C++ execution model. It requires the implementation to do things differently, as you'll eventually discover if for example you use threads under gcc and forget to specify -pthreads.

The same could happen on a platform where aligned int writes are atomic, but unaligned int writes are permitted and not atomic. For example IIRC on x86, unaligned int writes are not guaranteed atomic if they cross a cache line boundary. x86 compilers will not mis-align a declared int variable, for this reason and others. But if you play games with structure packing you could probably provoke an example.

So: pretty much any implementation will give you the guarantees you need, but might do so in quite a complicated way.

In general, I've found that it is not worth trying to rely on platform-specific guarantees about memory access, that I don't fully understand, in order to avoid mutexes. Use a mutex, and if that's too slow use a high-quality lock-free structure (or implement a design for one) written by someone who really knows the architecture and compiler. It will probably be correct, and subject to correctness will probably outperform anything I invent myself.

Steve Jessop
§1.9 of the standard discusses the abstract machine, which is like a thread of execution.
Potatoswatter
Why should you consider each thread as running on a separate C++ abstract machine? Does your threading implementation say that this is what happens? Is it even possible to do this? If I can consider threadA as a separate abstract machine subject to the rules of 1.9, then there is no legal way that a non-volatile variable can change value while I'm running, unless I change it. And yet, exactly that can and does happen in all threading implementations I'm aware of.
Steve Jessop
@Steve: it may be possible, but it's a bad idea. Doing that causes the abstract machine *you* have implemented to violate §1.9. Likewise, I can pass a pointer to a non-volatile variable to the kernel and ask for I/O to it. That doesn't make my C++ implementation noncompliant, it's my bug.
Potatoswatter
Does that mean a threading library which specified it may only be used on volatile variables would be standard compliant?
SilverSun
@Silver: threading libraries typically provide their own types, which have `volatile` sprinkled around their internals, and interact with little outside those types. Steve's point does hold to the point that the standard says nothing about a threading library, so no threading library can be "compliant."
Potatoswatter
Thanks for the answers. I agree about using mutexes rather than assumptions about compiler/architecture. But in some cases it does seem like overkill. e.g. if there is a bool which starts out at false, and is guaranteed to only change from false to true during its lifetime, and this bool gets read a lot from many threads, it seems truly wasteful to guard all reads with a lock.I feel a lot less comfortable about trying to avoid mutexes for non-boolean types, though, for the reasons you point out.
dan
Sorry, what's a bad idea? You said, in a comment since edited, that threads are separate abstract machines. Now you say they not? Anyway, they aren't. The abstract machine in 1.9 is not "like a thread of execution", it is "like a single-threaded environment". One thread of a multi-threaded environment is not the same thing at all, and threading implementations of C++ must define their own execution model. In particular, threading introduces writes on the same object which aren't relatively ordered. In C++ that's undefined behavior when it happens due to lack of a sequence point.
Steve Jessop
@dan: you almost certainly can do what you expect with that bool, since your threading implementation will provide guarantees which the C++ standard doesn't. Just don't look to the standard for the required behavior. I'd be quite surprised by an implementation in which a bool takes some other value whilst changing from false to true. I'd be much less surprised by an implementation in which a busy-looping thread, *even reading a volatile variable*, never ever "sees" a change written in another thread. volatile != memory barrier, unless the platform says so, and some platforms don't.
Steve Jessop
+1 for threads can't be implemented as a library link - too many people don't know this.
foxwoods
A: 

If your C++ implementation supplies the library of atomic operations specified by n2145 or some variant thereof, you can presumably rely on it. Otherwise, you cannot in general rely on "anything" about atomicity at the language level, since multitasking of any kind (and therefore atomicity, which deals with multitasking) is not specified by the existing C++ standard.

Alex Martelli
A: 

Volatile in C++ do not plays the same role than in Java. All the cases are undefined behavior as Steve saids. Some cases can be Ok for a compiler, oa given processor architecture and with a multi-threading system, but switching the optimization flags can make your program behave differently, as the C++03 compilers don't know about threads.

C++0x defines the rules that avoid race conditions and the operations that help you to master that, but to may knowledge there is no yet a compiler that implement yet all the part of the standard related to this subject.

Vicente Botet Escriba
A: 

It's generally a really, really bad idea to depend on this, as you could end up with bad things happening and only one some architectures. The best solution would be to use a guaranteed atomic API, for example the Windows Interlocked api.

DeadMG
+4  A: 

Most of the answers correctly address the CPU memory ordering issues you're going to experience, but none have detailed how the compiler can thwart your intentions by re-ordering your code in ways that break your assumptions.

Consider an example taken from this post:

volatile int ready;       
int message[100];      

void foo(int i) 
{      
    message[i/10] = 42;      
    ready = 1;      
}

At -O2 and above, recent versions of GCC and Intel C/C++ (don't know about VC++) will do the store to ready first, so it can be overlapped with computation of i/10 (volatile does not save you!):

    leaq    _message(%rip), %rax
    movl    $1, _ready(%rip)      ; <-- whoa Nelly!
    movq    %rsp, %rbp
    sarl    $2, %edx
    subl    %edi, %edx
    movslq  %edx,%rdx
    movl    $42, (%rax,%rdx,4)

This isn't a bug, it's the optimizer exploiting CPU pipelining. If another thread is waiting on ready before accessing the contents of message then you have a nasty and obscure race.

Employ compiler barriers to ensure your intent is honored. An example that also exploits the relatively strong ordering of x86 are the release/consume wrappers found in Dmitriy Vyukov's Single-Producer Single-Consumer queue posted here:

// load with 'consume' (data-dependent) memory ordering 
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T> 
T load_consume(T const* addr) 
{  
  T v = *const_cast<T const volatile*>(addr); 
  __asm__ __volatile__ ("" ::: "memory"); // compiler barrier 
  return v; 
} 

// store with 'release' memory ordering 
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T> 
void store_release(T* addr, T v) 
{ 
  __asm__ __volatile__ ("" ::: "memory"); // compiler barrier 
  *const_cast<T volatile*>(addr) = v; 
} 

I suggest that if you are going to venture into the realm of concurrent memory access, use a library that will take care of these details for you. While we all wait for n2145 and std::atomic check out Thread Building Blocks' tbb::atomic or the upcoming boost::atomic.

Besides correctness, these libraries can simplify your code and clarify your intent:

// thread 1
std::atomic<int> foo;  // or tbb::atomic, boost::atomic, etc
foo.store(1, std::memory_order_release);

// thread 2
int tmp = foo.load(std::memory_order_acquire);

Using explicit memory ordering, foo's inter-thread relationship is clear.

sstock