Hello all,
I'm reading Memory Barriers by Paul E. McKenney http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.07.23a.pdf everything is explained in great details and when I see that everything is clear I encounter one sentence, which stultifies everything and make me think that I understood nothing. Let me show the example
void foo(void)
{
a = 1; #1
b = 1; #2
}
void bar(void)
{
while (b == 0) continue; #3
assert(a == 1); #4
}
let's say this two functions are running on a different processors. Now what could possibly happen is store to a #1 could be seen after store to b #2 by the second processor, because first processor queues store to "a" and proceed to store b instruction. OK, that's fine, we add a write fence in the line between #1 and #2, but this code still can fail, because second processor might queue the invalidate message, so we add one more memory fence (read fence this time) in the line between #4 and #4.
void foo(void)
{
a = 1; #1
write_memory_barrier();
b = 1; #2
}
void bar(void)
{
while (b == 0) continue; #3
read_memory_barrier();
assert(a == 1); #4
}
this enforce second processor to process all queued messages (invalidate a) and read it again by sending read MESI message to first processor on #4. OK. Next the article says
Many CPU architectures therefore provide weaker memory-barrier instructions that do only one or the other of these two. Roughly speaking, a “read memory barrier” marks only the invalidate queue and a “write memory barrier” marks only the store buffer. while a full-fledged memory barrier does both.
Great, that's clear, but after that I see this
The effect of this is that a read memory barrier orders only loads on the CPU that executes it, so that all loads preceding the read memory barrier will appear to have completed before any load following the read memory barrier. Similarly, a write memory barrier orders only stores, again on the CPU that executes it, and again so that all stores preceding the write memory barrier will appear to have completed before any store following the write memory barrier.
so
all loads preceding the read memory barrier will appear to have completed before any load following the read memory barrier
that mixes up everything what was explained before. What does it mean? Which load in function "bar" have to complete before load of "a" #4? I understand the assert could fail without memory barrier in this function just because the processor may read an old value, because it still didn't manage to invalidate it's cache line, where object "a" is located.
Explanation in details would be really helpful, I'm trying to understand it all the day.
Thanks very much in advance.