views:

122

answers:

4

Hi,

I have a multi-threaded C++ application.

Now I know that for global shared variables, you should use volatile in some cases while checking the state of the variable or else the compiler could assume that the variable's value never changes (in that thread).

What if, however, instead of checking the status of a variable I call a method that returns the value of the variable? For instance:

static int num = 0;

...

void foo()
{
   while(getNum() == 0)
   {
      // do something (or nothing)
   }
}

Would I still have to make num a volatile variable? or does the compiler recognize that since I'm using a method to access that variable num it won't cache the result?

Anyone got any ideas?

Thanks in advance,

~Julian

edit: inside my while loop I removed the sleep call and replaced it with something generic such as a comment to do something (or nothing)

+2  A: 

No, volatile is never needed as long as you're doing the necessary synchronization.

Calling thread library synchronization functions, whatever they are on your platform, should take care of invalidating locally "cached" values and making the compiler reloads globals.

In this particular case, sleep is likely to have such an effect, but it's not a good implementation anyway. There should be a condition variable on num, protect it with a setter function, and have the setter function send a signal to foo.

As to the specific question, whether the function hides the access from optimization is extremely implementation- and situation-dependent. Your best bet is to compile the getter function in a separate invokation of the compiler, but even then, there's no way to guarantee that interprocedural optimization doesn't occur. For example, some platforms may put IR code in the .o files and perform code generation in the "linker" stage.

Disclaimer.

Key words above: 1. as long as you're doing the necessary synchronization and 2. likely to have such an effect.

1: sleep or an empty busy-loop are not "necessary synchronization." That is not the correct way to write a multithreaded program, period. So, volatile may be needed in such cases.

2: Yes, sleep may not be counted by the implementation an I/O function, and may even be labeled as pure and free of side-effects. In that case, volatile on the global would be necessary. However, I doubt any implementations have really been distributed which would break sleep loops like that, since they are unfortunately common.

Potatoswatter
+1: `volatile` is *not* a substitute for proper multithreading synchronisation constructs.
Oli Charlesworth
"Calling thread library synchronization functions should take care of invalidating locally 'cached' values and making the compiler reload globals". This is either plain wrong or correct in a subtle way that needs further explaining... My understanding is that it is wrong, calling `sleep` will NOT force the compiler into reloading the value of the global. In fact, being a part of the standard libraries it might know that it does not have any side effects and as such the compiler can infer that the variable does not need to be reread.
David Rodríguez - dribeas
Read the question/answers linked by gablin in a comment to the question.
David Rodríguez - dribeas
@David: On the other hand, this common use case would be a very strong argument against a `__pure` hint to the backend.
Potatoswatter
@David: sleep had nothing to do with my question...I should have replaced sleep() with just "//do something"
jbu
wrong wrong wrong. First, volatile does NOT effect hardware caching at all. It is a hint to the c/c++ abstract machine that the compiler models its code against that it cannot aggressively cache the value in question. There is no requirement in the C++ spec at all that non volatile tagged values are ever reloaded at all - over sequence points or function calls.
Chris Becke
@David: see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1680.pdf , which is only the first in a looong line of multithreading papers at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2869.html . Unfortunately it fails to reference an earlier, more detailed HP paper which I seem to have lost.
Potatoswatter
@Potatoswatter: Your comment, "Calling thread library synchronization functions, whatever they are on your platform, should take care of invalidating locally "cached" values and making the compiler reloads globals." interests me. Where can I find out more about how it does this? I know in JAVA when you lock/unlock you are guaranteed to see the most recent variable values because I've read about it. How do I know this is the case for C++? Especially since I'm using threads and locks from the QT Toolkit. I did not see anything in the API about this.
jbu
@Chris: I'm not referring to hardware caches… they are required to be coherent by themselves, and operation is transparent. No, there isn't such a requirement… however neither is there a guarantee that global objects aren't modified by library functions. `volatile` is only necessary here if the library thinks `sleep` is a pure, stateless function. Which it is, but it would be a dumb move on the implementer's part.
Potatoswatter
@jbu: Java multithreading is better-specified than C++. I should find a better reference (that HP paper specifically compares Java and C++), but see the above reply to David.
Potatoswatter
@Potatoswatter: Without actual synchronization primitives, there is no guarantee whatsoever on what the compiler can generate from the user code above. I will read in depth the documents you are referring to, but note that n1680 is accepted only as a rationale document to explain the problem not as a guideline for compiler vendors.
David Rodríguez - dribeas
@David: Right — see my edit. N1680 (and papers of that genre, that one really isn't the best and I was just looking for a ref) refer to proper synchronization primitives, not `sleep`.
Potatoswatter
@Potato - It might be a dumb move, but the question is (to my mind) whether the volatile keyword is required to get defined behaviour by a strict reading of the c++ spec. Practically speaking, the code will behave correctly without 'volatile'. Im not 100% convinced that the compiler has to treat library calls as black boxes that might access global variables that have not been passed (by some kind of reference).
Chris Becke
@Chris: See the disclaimer. What more can I say. I don't really want to advocate either `volatile` or `sleep`, but yes, the latter strictly requires the former.
Potatoswatter
@David: ah, the key is to read the papers by Boehm. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2176.html looks very juicy, but right now I need to go to sleep :v( .
Potatoswatter
We agree in that proper synchronization is required, we still disagree in your second paragraph: `sleep` has no effect on thread correctness whatsoever: it is fully unrelated to the problem at hand. By reading the answer you can get the idea that 'sleep is likely to cause such effect' with 'such effect' being 'invalidating locally "cached" values and making the compiler reloads globals.' which is really misleading.
David Rodríguez - dribeas
@Chris: The act of spawning a thread immediately violates §1.9/8, so a strictly conforming C++ implementation cannot allow multithreading at all. (Of course changed in C++0x.) To my mind, trying to construct a program out of wait-loops is not a subject for formalism but rather one of very practical damage control.
Potatoswatter
@David: Indeed `sleep` is insufficient for correctness, and it's not *guaranteed* to produce such an effect, but I would rate it as quite *likely* to have the right effect, as I've already said and clarified in the disclaimer. This is such an awful way to write software that anyone who makes a `sleep` loop and gets worried over `volatile`, I would say has their priorities backwards.
Potatoswatter
A: 

technically it does need to be marked volatile. Compilers are free to do what they want in order to optimise code as long as it continues to conform to the c++ abstract machine spec. A conforming compiler, with enough resources, could inline all instances of getNum, move the value of num into a register (or simply, noticing that its never actually changed by any code, treat it as a constant) for the entire lifetime of the program.

Practically speaking, no (current) CPU has enough free registers that even the most agressive optimizing compiler would choose to do that. But if correctness is desired - volatile is required.

Chris Becke
Ok, but let's take a slightly different situation. Say the static *num* now becomes a private member variable accessible through getNum and as a result, the compiler could not inline accesses. Would I still need to mark *num* as volatile?...I think the correct answer in both cases (the original and htis new one) is that I need to introduce synchronization through locks.
jbu
I think most compilers are smart enough to realise that a global variable in a multithreaded program may be changed by another thread!
James Anderson
It does not matter how convoluted you make it. The c++ abstract machine does not recognize threads. Therefore a variable that can be changed by a different thread is being changed outside the current abstract machine. Hence, to be correct, it NEEDS to be tagged volatile.
Chris Becke
+2  A: 

What you propose is basically a misuse of "volatile", its real purpose is to tell the compiler that the variable may be modified by external hardware or another process in the system, therefore, it needs to be really read from memory everytime its used.

It will not protect you from thread collisions etc. within your program, although in your case it looks like you are using the flag to signal a shutdown request.

Its actually OK to do this without synchronising etc. provided you know that only a single controlling thread will update the variable. Also I would use a bit manipulation to set the flag as this is more likely to be "atomic" on more hardware.

num && x'00000001' 
James Anderson
Yes, I think I should be using synchronization mechanisms to be sure and correct.
jbu
+1  A: 

Unfortunately volatile semantics are kinda wishy-washy. The concept of volatile wasn't really meant to be used for threading.

Potatoswatter is correct that calling the OS synchronization primitives will normally prevent the optimizing compiler from hoisting the read of num from the loop. But it works for sorta the same reason that using an accessor method works... by accident.

The compiler sees you calling a function that isn't immediately available for inlining or analysis, so it has to assume that any variable that could be used by some other function might be read or altered in that opaque function. So before doing the call, the compiler needs to write all those "global" variables back to memory.

At corensic, we added an inline function to jinx.h that does this in a more direct way. Something like the following:

 inline void memory_barrier() { asm volatile("nop" ::: "memory"); }

This is rather subtle, but it effectively tells the compiler (gcc) that it can't get rid of this chunk of opaque asm and that the opaque asm can read or write any globally visible piece of memory. This effectively stops the compiler from reordering loads/stores across this boundary.

For your example:

memory_barrier(); while (num == 0) { memory_barrier(); ... }

Now the read of num is stuck in place. And potentially more importantly, it's stuck in place with relation to other code. So you could have:

 while (flag == 0) { memory_barrier(); }  // spin
 process data[0..N]

And another thread does:

 populate data[0..N]
 memory_barrier();
 flag = 1;

PS. If you do this type of thing (essentially creating your own synchronization primitives) the perf wins can be big but the quality risk is high. Jinx is particularly good at finding bugs in these lock-free structures. So you might want to use it or some other tool to help test this stuff.

PPS. The linux community has a nice post about this called "volatile considered harmful", check it out.

Dave Dunn