ansaurus

Question

C++0x memory model and speculative loads/stores

Answer 1

+3 A:

I'm not familiar with all the stuff you refer to, but notice that in the y==2 case, in the first bit of code, x is not written to at all (or read, for that matter). In the second bit of code, it is written twice. This is more of a difference than just writing once vs. writing twice (at least, it is in existing threading models such as pthreads). Also, storing a value which would not otherwise be stored at all is more of a difference than just storing once vs. storing twice. For both these reasons, you don't want compilers just replacing a no-op with tmp = x; x = 17; x = tmp;.

Suppose thread A wants to assume that no other thread modifies x. It's reasonable to want it to be allowed to expect that if y is 2, and it writes a value to x, then reads it back, it will get back the value it has written. But if thread B is concurrently executing your second bit of code, then thread A could write to x and later read it, and get back the original value, because thread B saved "before" the write and restored "after" it. Or it could get back 17, because thread B stored 17 "after" the write, and stored tmp back again "after" thread A reads. Thread A can do whatever synchronisation it likes, and it won't help, because thread B isn't synchronised. The reason it's not synchronised (in the y==2 case) is that it's not using x. So the concept of whether a particular bit of code "uses x" is important to the threading model, which means compilers can't be allowed to change code to use x when it "shouldn't".

In short, if the transformation you propose were allowed, introducing a spurious write, then it would never be possible to analyse a bit of code and conclude that it does not modify x (or any other memory location). There are a number of convenient idioms which would therefore be impossible, such as sharing immutable data between threads without synchronisation.

So, although I'm not familiar with C++0x's definition of "data race", I assume that it includes some conditions where programmers are allowed to assume that an object is not written to, and that this transformation would violate those conditions. I speculate that if y==2, then your original code, together with concurrent code: x = 42; x = 1; z = x in another thread, is not defined to be a data race. Or at least if it is a data race, it's not one which permits z to end up with value either 17, or 42.

Consider that in this program, the value 2 in y might be used to indicate, "there are other threads running: don't modify x, because we aren't synchronised here, so that would introduce a data race". Perhaps the reason there's no synchronisation at all, is that in all other cases of y, there are no other threads running with access to x. It seems reasonable to me that C++0x would want to support code like this:

if (single_threaded) {
    x = 17;
} else {
    sendMessageThatSafelySetsXTo(17);
}

Clearly then, you don't want that transformed to:

tmp = x;
x = 17;
if (!single_threaded) {
    x = tmp;
    sendMessageThatSafelySetsXTo(17);
}

Which is basically the same transformation as in your example, but with only 2 cases, instead of there being enough to make it look like a good code-size optimisation.

Steve Jessop 2010-01-04 20:31:04

In C++0x, a data race occurs if two accesses to the same memory location by different threads are not ordered, at least one of them stores to the memory location, and at least one of them is not a synchronization action. I added this to the question.

janneb 2010-01-04 21:21:29

+1 excellent answer

swegi 2010-01-04 21:24:58

@janneb. Thanks, in that case there is no data race in the first code snippet if y==2, and another thread accesses x, but there is a data race in the second snippet if y==2 and another thread accesses x. Clearly the compiler must not be allowed to add data races to otherwise race-free code, or the whole model is useless. Hence the transformation is forbidden.

Steve Jessop 2010-01-04 22:22:45

Answer 2

+2 A:

If y==2, and another thread modifies or reads x, how is there a race condition in the original sample? This thread never touches x, so other threads can do so freely.

But with the reordered version, our thread modifies x, if only temporarily, so if another thread also manipulates it, we now have a race condition where none was present before.

jalf 2010-01-04 20:43:28

ansaurus

tags:

views:

answers:

C++0x memory model and speculative loads/stores

related questions