ansaurus

Question

Answer 1

A:

Interlocked.Exchange() should guarantee that the value is flushed to all CPUs properly - it provides its own memory barrier.

I'm surprised that the compiler is complaing about passing a volatile into Interlocked.Exchange() - the fact that you're using Interlocked.Exchange() should almost mandate a volatile variable.

The problem you might see is that if the compiler does some heavy optimization of Bar() and realizes that nothing changes the value of m_value it can optimize away your check. That's what the volatile keyword would do - it would hint to the compiler that that variable may be changed outside of the optimizer's view.

Aaron 2009-11-18 18:33:27

but is the barrier place before/after the assignment? (or both?)

2009-11-18 18:59:34

Image that it happens "with" the assignment.For reads (acquires), the read is added to the read queue, then the CPU waits for the read queue to be flushed before adding anything else to it.This way, subsequent reads (as listed in code) don't get reordered (inside the read queue) before your important read of m_value.For writes, the write queue is flushed first, then the write of m_value is queued. This way writes don't happen after the write to m_value.For full barriers, both queues are flushed.Note, again, that the important part is how the value is ordered relative to other values...

tony 2009-11-19 19:57:24

@Aaron: re "flushed to all CPUs" - a single sided barrier (ie only from one CPU) is not enough. The reading CPUs need to have barriers to prevent reads from being re-ordered.

tony 2009-11-19 21:26:17

@tony - That sounds correct.

Aaron 2009-11-19 23:19:57

Answer 2

A:

The interlocked exchange operations guarantee a memory barrier.

The following synchronization functions use the appropriate barriers to ensure memory ordering:

Functions that enter or leave critical sections

Functions that signal synchronization objects

Wait functions

Interlocked functions

(Source : link)

But you are out of luck with register variables. If m_value is in a register in Bar, you won't see the change to m_value. Due to this, you should declare shared variables 'volatile'.

Christopher 2009-11-18 18:38:33

Is that a block quote from another source? If so, a citation would be good.

Heath Hunnicutt 2009-11-18 18:42:59

Link added. Was copied from a local MSDN.

Christopher 2009-11-18 19:03:51

Answer 3

A:

Your question starts with a very questionable assumption: "Bar() runs AFTER Foo()". There are few ways to guarantee that's always the case. The obvious one is to wait starting with the 2nd thread until after Foo() has run. No problem there, the thread is guaranteed to start with an updated cache.

The only other way is to use a sync object to ensure that Bar() can't run before (or during) Foo(). Now you've already got the barrier, it is no longer necessary to interlock. In other words, your AFTER clause already provides enough guarantees that Bar() sees the updated value.

Hans Passant 2009-11-18 18:48:06

you are missing the point of the question..

2009-11-18 18:58:44

Hmm, maybe you are missing the point of the answer. Document exactly *how* you guarantee that Bar runs after Foo and I can give probably give you a better answer.

Hans Passant 2009-11-18 19:26:24

I don't guarantee it in any way since this isn't "real code", just an example. I'm asking about a scenario in which Bar() did ran after Foo() [as a result of the thread scheduler .. there isn't any kind of synchronization in the code itself]

2009-11-18 19:47:29

If Bar() accidentally runs after Foo() then the implied memory barrier provided by Interlocked.Exchange() is sufficient for it to see the update.

Hans Passant 2009-11-18 20:31:45

@nobugz - no the single barrier is not enough. You need a barrier in each thread (or processor, really). The writing thread does a release barrier, the reading thread needs an acquire barrier.Also, not so much for m_value, but for the other data that m_value flags as being 'ready' or whatever else might depend on m_value being set.

tony 2009-11-19 19:02:13

Answer 4

+1 A:

Memory barriers don't particularly help you. They specify an ordering between memory operations, in this case each thread only has one memory operation so it doesn't matter. One typical scenario is writing non-atomically to fields in a structure, a memory barrier, then publishing the address of the structure to other threads. The Barrier guarantees that the writes to the structures members are seen by all CPUs before they get the address of it.

What you really need are atomic operations, ie. InterlockedXXX functions, or volatile variables in C#. If the read in Bar were atomic, you could guarantee that neither the compiler, nor the cpu, does any optimizations that prevent it from reading either the value before the write in Foo, or after the write in Foo depending on which gets executed first. Since you are saying that you "know" Foo's write happens before Bar's read, then Bar would always return true.

Without the read in Bar being atomic, it could be reading a partially updated value (ie. garbage), or a cached value (either from the compiler or from the CPU), both of which may prevent Bar from returning true which it should.

Most modern CPU's guarantee word aligned reads are atomic, so the real trick is that you have to tell the compiler that the read is atomic.

Greg Rogers 2009-11-18 19:13:30

Answer 5

A:

I'm not completely sure but I think the Interlocked.Exchange will use the InterlockedExchange function of the windows API that provides a full memory barrier anyway.

This function generates a full memory barrier (or fence) to ensure that memory operations are completed in order.

Jorge Córdoba 2009-11-18 19:20:09

Does it matter that the memory barrier was used in thread #1 when the value was written, but not on thread #2 when the value is read?

2009-11-18 19:48:53

it does matter - you need a barrier on the reading thread, else the read thread might re-order some other reads before the read to m_value.ie if (m_value) { Foo foo = important_shared_data; }If you don't want important_shared_data read early, then you need read/acquire barrier on m_value. (And release on write).If you don't other have data dependent on m_value, then you probably don't need the barriers at all - but how often do you have data that isn't dependent on other things? ie is m_value useless?

tony 2009-11-19 19:52:35

Answer 6

A:

If you don't tell the compiler or runtime that m_value should not be read ahead of Bar(), it can and may cache the value of m_value ahead of Bar() and simply use the cached value. If you want to ensure that it sees the "latest" version of m_value, either shove in a Thread.MemoryBarrier() or use Thread.VolatileRead(ref m_value). The latter is less expensive than a full memory barrier.

Ideally you could shove in a ReadBarrier, but the CLR doesn't seem to support that directly.

EDIT: Another way to think about it is that there are really two kinds of memory barriers: compiler memory barriers that tell the compiler how to sequence reads and writes and CPU memory barriers that tell the CPU how to sequence reads and writes. The Interlocked functions use CPU memory barriers. Even if the compiler treated them as compiler memory barriers, it still wouldn't matter, as in this specific case, Bar() could have been separately compiled and not known of the other uses of m_value that would require a compiler memory barrier.

MSN 2009-11-18 22:18:28

how come Thread.VoltileRead is less expensive? according to Reflector, it still uses a full fence (actually calling MemoryBarrier)

2009-11-19 17:15:12

I haven't looked at the implementation, but according to the documentation (eg http://msdn.microsoft.com/en-us/library/aa645755(VS.71).aspx) VolatileRead just does an AcquireBarrier (ie 'half-barrier', whereas Interlocked and/or MemoryBarrier does a full barrier. So VolatileRead could be less expensive. However, on some/most Intel platforms, all the barriers are implemented as full barriers anyhow.

tony 2009-11-19 21:35:15

But again, there's a difference between a processor barrier and a compiler barrier. In this particular case, you at least want a compiler barrier.

MSN 2009-11-19 21:51:08

Answer 7

+2 A:

The usual pattern for memory barrier usage matches what you would put in the implementation of a critical section, but split into pairs for the producer and consumer. As an example your critical section implementation would typically be of the form:

while (!pShared->lock.testAndSet_Acquire()) ;
// (this loop should include all the normal critical section stuff like
// spin, waste, 
// pause() instructions, and last-resort-give-up-and-blocking on a resource 
// until the lock is made available.)

// Access to shared memory.

pShared->foo = 1 
v = pShared-> goo

pShared->lock.clear_Release()

Acquire memory barrier above makes sure that any loads (pShared->goo) that may have been started before the successful lock modification are tossed, to be restarted if neccessary.

The release memory barrier ensures that the load from goo into the (local say) variable v is complete before the lock word protecting the shared memory is cleared.

You have a similar pattern in the typical producer and consumer atomic flag scenerio (it is difficult to tell by your sample if that is what you are doing but should illustrate the idea).

Suppose your producer used an atomic variable to indicate that some other state is ready to use. You'll want something like this:

pShared->goo = 14

pShared->atomic.setBit_Release()

Without a "write" barrier here in the producer you have no guarantee that the hardware isn't going to get to the atomic store before the goo store has made it through the cpu store queues, and up through the memory hierarchy where it is visible (even if you have a mechanism that ensures the compiler orders things the way you want).

In the consumer

if ( pShared->atomic.compareAndSwap_Acquire(1,1) )
{
   v = pShared->goo 
}

Without a "read" barrier here you won't know that the hardware hasn't gone and fetched goo for you before the atomic access is complete. The atomic (ie: memory manipulated with the Interlocked functions doing stuff like lock cmpxchg), is only "atomic" with respect to itself, not other memory.

Now, the remaining thing that has to be mentioned is that the barrier constructs are highly unportable. Your compiler probably provides _acquire and _release variations for most of the atomic manipulation methods, and these are the sorts of ways you would use them. Depending on the platform you are using (ie: ia32), these may very well be exactly what you would get without the _acquire() or _release() suffixes. Platforms where this matters are ia64 (effectively dead except on HP where its still twitching slightly), and powerpc. ia64 had .acq and .rel instruction modifiers on most load and store instructions (including the atomic ones like cmpxchg). powerpc has separate instructions for this (isync and lwsync give you the read and write barriers respectively).

Now. Having said all this. Do you really have a good reason for going down this path? Doing all this correctly can be very difficult. Be prepared for a lot of self doubt and insecurity in code reviews and make sure you have a lot of high concurrency testing with all sorts of random timing scenerios. Use a critical section unless you have a very very good reason to avoid it, and don't write that critical section yourself.

Peeter Joot 2009-11-19 04:43:11

great answer, thanks (just for the sake of interest.. multithreaded and lock free programming is a very interesting topic, so I'm exploring this area.. [btw, do you have recommendations about any good books about these topics? [besides Joe Duffy's book ..]])

2009-11-19 17:25:17

I don't actually know joe duffy's book ... everything I've learned on the subject was "on the job". I get a lot of questions on the topic having originally coded our product's lock-free implementation of reader-writer mutexes, and all the inline assembly for our product's atomic interfaces (before the days where the compilers provided nice intrinsics for this like they do now). The best I have for references is here:http://sites.google.com/site/peeterjoot/math2009/atomic.pdf... it's just the references for a paper on the topic I've never actually gotten to writing:)

Peeter Joot 2009-11-20 03:38:04

(that url now has more than links but I can't promise the content is coherent;)

Peeter Joot 2009-12-05 05:26:55

Answer 8

A:

If m_value is not marked as volatile, then there is no reason to think that the value read in Bar is fenced. Compiler optimizations, caching, or other factors could reorder the reads and writes. Interlocked exchange is only helpful when it is used in an ecosystem of properly fenced memory references. This is the whole point of marking a field volatile. The .Net memory model is not as straight forward as some might expect.

Jeffrey L Whitledge 2009-11-19 04:55:18

Would it be acceptable to mark m_value as volatile and pass it by reference to the Interlocked method? (though the compiler warns about it .. though it seems meaningless in such case)

2009-11-19 17:11:41

Yes, in this case it is safe to ignore the compiler warning. Passing the variable (by reference) to any method other than the interlocked methods would be a problem, but for the interlocked methods it's OK.

Jeffrey L Whitledge 2009-11-19 19:05:02

If the variable is marked volatile, then (in C#, NOT C++) the compiler will add acquire/release barriers as needed, so InterLocked is not usually necessary. (ie Interlocked gives full barrier, volatile gives acquire on read, release on write, Typically that's all you need).

tony 2009-11-19 21:31:06

Volatile is not an alternative to Interlocked.Exchange(), because it does not ensure atomic sequences of read and write operations. Volatile only ensures that the operations will not be reordered.

Jeffrey L Whitledge 2009-11-23 14:24:35

ansaurus

tags:

views:

answers:

Interlocked and Memory Barriers

related questions