ansaurus

Question

Answer 1

+2 A:

What does it mean?

It means that if you have:

read
read
read
READ BARRIER
read
read
read

then the read barrier acts as a "join point" dividing these reads into two batches. All the reads preceding the read barrier will have been done before any read following the read barrier is begun.

Which loads in bar() must complete before the load of a (#4) is begun?

All reads of b (#3) are forced to precede any read of a (#4). This means that a is not read till after b is no longer 0. Because foo() uses a write barrier to ensure that a has already been changed to 1 (#1) by the time that b is changed (#2). The two barriers thus work together to ensure the assert statement will always succeed.

Jeremy W. Sherman 2010-10-22 18:20:51

Thanks for your answer, but if we have a look at example point 4.3 in the article we see an example, when actually all reads of b (#3) precede read of a without memory barrier and assert still fails, because CPU 1 executes the assert(a==1), and, sincethe old value of “a” is still in CPU 1’s cache,this assertion fails.

confucius 2010-10-22 18:45:59

In the fixed code (with read barrier) CPU 1 executes the assert(a==1), and, sincethe cache line containing “a” is no longer inCPU 1’s cache (because read barrier forced to invalidate cache line), it transmits a "read" message.

confucius 2010-10-22 18:48:08

So it's not just an ordering, right? I don't see how it could be explained with just saying that it forces the ordering. I think there is something that I don't understand, some fundamental detail, that doesn't allow me to join all the information I have and get the final picture.

confucius 2010-10-22 18:51:04

I think the issue is that the wrap-up is at a higher level of abstraction. Forget temporarily about caching, and imagine that all reads are actually being carried out for the first time before/after the read barrier. That's the end result of invalidating the cache: the processor is forced to perform the read of `a` after it hits the read barrier. Without the read barrier, it's free to read `a` and then loop till `b` is non-zero. The barrier ensures that the read of `a` is not reordered before the barrier, i.e., before `b` is nonzero.

Jeremy W. Sherman 2010-10-22 19:28:44

Does it mean, that this is just said to simplify the understanding of what is going on when you look at code? So this message queue, caching done by processor just results in appearance of memory operations out of order from single processor point of view? Like for example in function "bar" we can imagine, that processor reorders operations and read "a" before it starts reading "b"? So this is said just to do not think every time about caching and low level details of implementation?

confucius 2010-10-23 09:52:27

Yes! I think that is exactly what is going on. This is the sense in which these are "barriers" at all - they block reordering from a simplified (uniprocessor) point of view. Otherwise, it would make more sense to call the operations "flush write queue" and "invalidate read cache".

Jeremy W. Sherman 2010-10-23 17:06:52

By the way, you might find the article [*x86-TSO: A Rigorous and Usable Programmer's Memory Model for x86 Multiprocessors*](http://moscova.inria.fr/~zappa/readings/cacm10.pdf) by Owens, Sarkar, and Sewell, published in the May 2010 CACM, to be very helpful in thinking about these things.

Jeremy W. Sherman 2010-10-23 17:09:17

There's also an [extended version of the `x86-TSO` paper](http://www.cl.cam.ac.uk/~pes20/weakmemory/x86tso-paper.pdf) that includes proof outlines and a verified checker. The one published in CACM was really just a "research highlight". That said, it's the version I read, and it was definitely worth the time. Read the version in the actual issue of CACM if you get the chance; the diagrams and tables are much prettier and more readable than they are in the final-draft preprint I linked to.

Jeremy W. Sherman 2010-10-23 17:12:41

Thanks! That article looks interesting, I was searching for everything related to barriers, but I didn't find this one. Do you know any way of testing memory reordering? My compiler on x86 doesn't generate any barriers at all, maybe there is way to simulate it on Virtual Machine? How the tests are done if there is no ALPHA processor available?

confucius 2010-10-23 17:14:18

The article should describe how they verified their model was accurate on actual x86 hardware. You could introduce barriers into your code by using `asm` statements.

Jeremy W. Sherman 2010-10-23 17:35:35

Thank very much.

confucius 2010-10-23 17:46:20

ansaurus

tags:

views:

answers:

Memory Fences - Need help to understand

related questions