tags:

views:

591

answers:

7

I'm writing a C++ app.

I have a class variable that more than one thread is writing to.

In C++, anything that can be modified without the compiler "realizing" that it's being changed needs to be marked volatile right? So if my code is multi threaded, and one thread may write to a var while another reads from it, do I need to mark the var volaltile?

[I don't have a race condition since I'm relying on writes to ints being atomic]

Thanks!

+2  A: 

volatile instruct the compiler not to optimize upon "intuition" of a variable value or usage since it could be optimize "from the outside".

volatile won't provide any synchronization however and your assumption of writes to int being atomic are all but realistic!

I'd guess we'd need to see some usage to know if volatile is needed in your case (or check the behavior of your program) or more importantly if you see some sort of synchronization.

Francis Boivin
That's NOT true, when you are building HPC application/algorithms you are usually quite aware of the architecture you are going to work with. And thus you are not going to add a useless lock, if you don't need it.
Ben
Well hum, my point of view at least.
Ben
I always thought that write an int was an atomic op (at least on x86 CPU). Do you have some good documentations about atomic ops ?
Niklaos
@Ben: In HPC apps, you know the architecture, but you don't know what optimizations the compiler will perform. And if one of the things you know about the architecture is "it uses out-of-order execution", which is usually the case, then a synchronization mechanism is absolutely critical... Which it usually also is even without OOO execution, because of compiler optimizations.
jalf
Ok, I was just saying assuming that writing an integer is an atomic operation is a common assumption.
Ben
If there is any possibility that your code might be expected to run on something other than x86, then its a terrible assumption. many common embedded platforms do not have data paths that are that wide. different regions of memory on those platforms have very different alignment expectations, and the compiler hides all of this from you, just doing the right thing, which might mean it has to write a byte at a time, in four instructions. Almost all compilers have some atomic op headers you can use to make sure your atomics are atomic!
TokenMacGuy
@TokenMacGuy Francis said "This assumption is all but realistic". I think there are some situations where it makes sens. You don't write an atom-level simulation to be run on a embedded platform, but you *do* you do need performance. Also in these applications there are other common optimizations that are much more platform specific (like placing the threads close the the memory in NUMA archs). That why earing that assuming integer atomic writes made me react. :/
Ben
Lockfree programming requires at least two primitives. Atomic access and memory ordering. C/C++ provide none of those guarantees. Most processors explicitly break the memory ordering guarantee. **IF** you can verify the processor provides memory ordering and **IF** you can verify the compiler's optimizer didn't reorder, and **IF** you can verify the processor writes int's atomically then yes you can make this work with volatile. Most processors don't support mem to mem atomic writes. So you'll have to check that as well. Have I made is clear that you don't want to do this?
caspin
@caspin Memory barriers provided by compilers do provide theses guarentees, and a C compiler won't reorder two volatile variables. Anyway, according to what you said lockfree programming is impossible on conventional architectures, I have to disagree !
Ben
A: 

Yes, you need volatile (otherwise the variable could be cached in a register).

You probably also need locks if your code may run on a system with more than one CPU core.

Paul R
Even if you're on a single core the scheduler may cause different threads to execute such that inconsistencies occur.
Andrew
@Andrew - I don't see how that could happen on a single core (assuming that integer writes are indeed atomic, which is a reasonable assumption for most modern architectures) ?
Paul R
Why all the down-votes ? I wish people would leave comments as to why they are down-voting what appear (to me at least) to be perfectly reasonable answers...
Paul R
@Paul if integer writes are atomic, sure. Can't happen. Shouldn't happen on multicore either.
Andrew
@Andrew, Atomic writes will not change the fact that a processor/core can have a value in its cache that is not up to date!
Ben
@Ben: If a value is in a cache, then how can locks work on multiple cores? I have to conclude that a write from one core invalidates caches on other cores, which means that reads/writes from multiple cores which happen at different times would work, but locks would still be needed for overlapping reads/writes.
quamrana
Locks don't just lock they also synchronize the value among the caches and the memory. If you write without locking you have to use volatile or a memory barrier or whatever because otherwise no one can guess it.
Ben
A: 

Without locking you may still get 'impossible' re-orderings done by the compiler or processor. And there's no guarantee that writes to ints are atomic.

It would be better to use proper locking.

Douglas Leeder
+9  A: 

C++ hasn't yet any provision for multithreading. In practice, volatile doesn't do what you mean (it has been designed for memory adressed hardware and while the two issues are similar they are different enough that volatile doesn't do the right thing -- note that volatile has been used in other language for usages in mt contexts).

So if you want to write an object in one thread and read it in another, you'll have to use synchronization features your implementation needs when it needs them. For the one I know of, volatile play no role in that.

FYI, the next standard will take MT into account, and volatile will play no role in that. So that won't change. You'll just have standard defined conditions in which synchronization is needed and standard defined way of achieving them.

AProgrammer
A: 

Volatile will solve your problem, ie. it will guarantee consistency among all the caches of the system. However it will be inefficiency since it will update the variable in memory for each R or W access. You might concider using a memory barrier, only whenever it is needed, instead. If you are working with or gcc/icc have look on sync built-ins : http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html

EDIT (mostly about pm100 comment): I understand that my beliefs are not a reference so I found something to quote :)

The volatile keyword was devised to prevent compiler optimizations that might render code incorrect in the presence of certain asynchronous events. For example, if you declare a primitive variable as volatile, the compiler is not permitted to cache it in a register

From Dr Dobb's

More interesting :

Volatile fields are linearizable. Reading a volatile field is like acquiring a lock; the working memory is invalidated and the volatile field's current value is reread from memory. Writing a volatile field is like releasing a lock : the volatile field is immediately written back to memory. (this is all about consistency, not about atomicity)

from The Art of multiprocessor programming, Maurice Herlihy & Nir Shavit

Lock contains memory synchronization code, if you don't lock, you must do something and using volatile keyword is probably the simplest thing you can do (even if it was designed for external devices with memory binded to the address space, it's not the point here)

Ben
Why is this response downvoted?
anon
because its wrong. volatile has nothing to do with memory caches.
pm100
@pm100, It was not designed for this I agree, but it has to do with caches, see my edit please.
Ben
your linked article is from 2001. We've since learned our lesson. Please see http://www.drdobbs.com/high-performance-computing/212701484 for a modern view of volatile.
caspin
caching in a register is not the same as 'the caches of a system' - ie the problem of cache coherence for multi proc systems. THe reason I jumped on this is because it is radically wrong as the various articles have pointed out. I used to believe that c++ volatile helped here, but it does not; I am trying to pass on my lessons
pm100
+1  A: 

Yes, volatile is the absolute minimum you'll need. It ensures that the code generator won't generate code that stores the variable in a register and always performs reads and writes from/to memory. Most code generators can provide atomicity guarantees on variables that have the same size as the native CPU word, they'll ensure the memory address is aligned so that the variable cannot straddle a cache-line boundary.

That is however not a very strong contract on modern multi-core CPUs. Volatile does not promise that another thread that runs on another core can see updates to the variable. That requires a memory barrier, usually an instruction that flushes the CPU cache. If you don't provide a barrier, the thread will in effect keep running until such a flush occurs naturally. That will eventually happen, the thread scheduler is bound to provide one. That can take milliseconds.

Once you've taken care of details like this, you'll eventually have re-invented a condition variable (aka event) that isn't likely to be any faster than the one provided by a threading library. Or as well tested. Don't invent your own, threading is hard enough to get right, you don't need the FUD of not being sure that the very basic primitives are solid.

Hans Passant
Volatile isn't the absolute minimum. It's well below the minimum. The minimum would also have to prevent read/write reordering around the shared variable.
jalf
@jalf - how many graduations below "absolute minimum" do you care to consider?
Hans Passant
Just one: "below it" ;)
jalf
+1  A: 

I think that volatile only really applies to reading, especially reading memory-mapped I/O registers.

It can be used to tell the compiler to not assume that once it has read from a memory location that the value won't change:

while (*p)
{
  // ...
}

In the above code, if *p is not written to within the loop, the compiler might decide to move the read outside the loop, more like this:

cached_p=*p
while (cached_p)
{
  // ...
}

If p is a pointer to a memory-mapped I/O port, you would want the first version where the port is checked before the loop is entered every time.

If p is a pointer to memory in a multi-threaded app, you're still not guaranteed that writes are atomic.

quamrana
It's not only about reading: "for (i = 0; i < 10; ++i) { j = i; } can be replaced with j = 10; when j is not volatile.
stefaanv