So I was reading about the memory model that is part of the upcoming C++0x standard. However, I'm a bit confused about some of the restrictions for what the compiler is allowed to do, specifically about speculative loads and stores.
To start with, some of the relevant stuff:
Hans Boehm's pages about threads and the memory model in C++0x
Boehm, "Threads Cannot be Implemented as a Library"
Boehm and Adve, "Foundations of the C++ Concurrency Memory Model"
Boehm, "Concurrency memory model compiler consequences", N2338
Now, the basic idea is essentially "Sequential Consistency for Data-Race-Free Programs", which seems to be a decent compromise between ease of programming and allowing the compiler and hardware opportunities to optimize. A data race is defined to occur if two accesses to the same memory location by different threads are not ordered, at least one of them stores to the memory location, and at least one of them is not a synchronization action. It implies that all read/write access to shared data must be via some synchronization mechanism, such as mutexes or operations on atomic variables (well, it is possible to operate on the atomic variables with relaxed memory ordering for experts only, but the default provides for sequential consistency).
In light of this, I'm confused about the restrictions about spurious or speculative loads/stores on ordinary shared variables. For instance, in N2338 we have the example
switch (y) {
case 0: x = 17; w = 1; break;
case 1: x = 17; w = 3; break;
case 2: w = 9; break;
case 3: x = 17; w = 1; break;
case 4: x = 17; w = 3; break;
case 5: x = 17; w = 9; break;
default: x = 17; w = 42; break;
}
which the compiler is not allowed to transform into
tmp = x; x = 17;
switch (y) {
case 0: w = 1; break;
case 1: w = 3; break;
case 2: x = tmp; w = 9; break;
case 3: w = 1; break;
case 4: w = 3; break;
case 5: w = 9; break;
default: w = 42; break;
}
since if y == 2 there is a spurious write to x which could be a problem if another thread were concurrently updating x. But, why is this a problem? This a data race, which is prohibited anyway; in this case, the compiler just makes it worse by writing to x twice, but even a single write would be enough for a data race, no? I.e. a proper C++0x program would need to synchronize access to x, in which case there would no longer be data race, and the spurious store wouldn't be a problem either?
I'm similarly confused about Example 3.1.3 in N2197 and some of the other examples as well, but maybe an explanation for the above issue would explain that too.
EDIT: The Answer:
The reason why speculative stores are a problem is that in the switch statement example above, the programmer might have elected to conditionally acquire the lock protecting x only if y != 2. Hence the speculative store might introduce a data race that was not there in the original code, and the transformation is thus forbidden. The same argument applies to Example 3.1.3 in N2197 as well.