views:

293

answers:

7

I've been reading Joe Duffy's book on Concurrent programming. I have kind of an academic question about lockless threading.

First: I know that lockless threading is fraught with peril (if you don't believe me, read the sections in the book about memory model)

Nevertheless, I have a question: suppose I have an class with an int property on it.

The value referenced by this property will be read very frequently by multiple threads

It is extremely rare that the value will change, and when it does it will be a single thread that changes it.

If it does change while another operation that uses it is in flight, no one is going to lose a finger (the first thing anyone using it does is copy it to a local variable)

I could use locks (or a readerwriterlockslim to keep the reads concurrent). I could mark the variable volatile (lots of examples where this is done)

However, even volatile can impose a performance hit.

What if I use VolatileWrite when it changes, and leave the access normal for reads. Something like this:


public class MyClass
{
  private int _TheProperty;
  internal int TheProperty
  {
    get { return _TheProperty; }
    set { System.Threading.Thread.VolatileWrite(ref _TheProperty, value); }
  }
}

I don't think that I would ever try this in real life, but I'm curious about the answer (more than anything, as a checkpoint of whether I understand the memory model stuff I've been reading).

A: 

In C#, the int type is thread-safe.

Since you said that only one thread writes to it, you should never have contention as to what is the proper value, and as long as you are caching a local copy, you should never get dirty data.

You may, however, want to declare it volatile if an OS thread will be doing the update.

Also keep in mind that some operations are not atomic, and can cause problems if you have more than one writer. For example, even though the bool type wont corrupt if you have more than one writer, a statement like this:

a = !a;

is not atomic. If two threads read at the same time, you have a race condition.

John Gietzen
There's a difference between "never seeing a value which has never been written" and "always seeing an up-to-date value". Without any memory barriers, it's possible to always see the old value.
Jon Skeet
@Jon: That's only *if* you declare it volatile, right?
John Gietzen
@John: Nope, the point of volatile is to try to get *away* from that problem.
Jon Skeet
@John: what Jon said. Reading about lockless development is like reading about Quantum Mechanics -- it's way weirder than it looks. One simple problem: memory isn't flat. Each CPU or each core might have it's own cache. How do you konw that your write made it from the cache? If you use locking, hardware calls are made to ensure cache consistency. If you don't use locks, neither the compiler nor the hardware "knows" that it needs to be concerned about threaded access or about keeping the cache's consistent. then there's the fact that the CPU can re-order your instructions. Weird stuff.
JMarsch
I guess that's why Joel Spolsky always says that "Nobody is smart enough to do multithreadding," eh?
John Gietzen
@John: Yah, I think he was right.
JMarsch
caches stay consistent regardless of locks.locks just internally also handle ordering and visibility.The only problem here is C# runtime - it can keep things in registers if it wants. If it was just C/C++, you would know that *eventually* the memory would be updated, even without locks, barriers, etc.
tony
So, setting it volatile is basically the opposite of the C/C++ 'register' keyword?
John Gietzen
volatile in C/C++ is the opposite of register, but not in C#. In C# volatile is stronger - it adds memory barriers to ensure other memory reads/writes are not reordered around it (thus assuring that things work as expect even across threads). The whole problem highlighted by the question is that C# has thus lost that weaker volatile == !register. The memory-barrier volatile is up to 100x slower (ie a barrier can stall the CPU for 100s of cycles to wait for memory).
tony
@tony: Thanks for all the info. I was going to delete my post, but these comments are gold.
John Gietzen
+1  A: 

This is the pattern I use when the 'last writer wins' pattern is applicable to the situation. I had used the volatile keyword, but after seeing this pattern in a code example from Jeffery Richter, I started using it.

codekaizen
Interesting. Which Richter book covered that?
JMarsch
I asked him at Devscovery Redmond in 2007. I thought it was also in CLR via C#.
codekaizen
+4  A: 

The question is whether the reading thread will ever see the change. It's not just a matter of whether it sees it immediately.

Frankly I've given up on trying to understand volatility - I know it doesn't mean quite what I thought it used to... but I also know that with no kind of memory barrier on the reading thread, you could be reading the same old data forever.

Jon Skeet
I agree about giving up on understanding it! That's why I consider this an acedemic question lol. Whether it will "ever" be seen is exactly what prompted me to ask the question-- I'm not worried about immediately, even if you use locks, there's still a race (will the reader get the lock before the writer does). I read Joe's blog quite a bit, and it seems like every time I think I get the volatility issue, I read a new post that confirms that it's weirder than I thought! Anyway, thanks for posting your thoughts!
JMarsch
@Jon Quick question: Isn't the memory barrier what prevents reordering of instructions, and the volatile read/write what causes cache flushes? Does a barrier imply a flush?, or does a volatile read/write imply a barrier? (or neither?) argh.
JMarsch
@JMarsch: The memory barrier imposes an order upon the *memory accesses* required by instructions; whether it prevents "reordering of instructions" is an implementation detail of the processor. The memory barrier merely provides a guarantee about the observable consequences of certain operations.
Eric Lippert
@Eric Lippert. Yes well put Eric. A memory barrier simply ensures that that any pending reads or writes will be perceived to have been completed. This stuff gets very complicated, very quickly and when you throw L1 and L2 ( and sometimes even ) L3 caches into the mix then the hardware synchronisation becomes almost byzantine in complexity
zebrabox
+2  A: 

The "performance hit" of volatile is because the compiler now generates code to actually check the value instead of optimizing that away - in other words, you'll have to take that performance hit regardless of what you do.

Anon.
That is not the only performance hit of volatile. What is more relevant is how volatility changes how caches are flushed.
Eric Lippert
@Anon The difference that I'm exploring is this: If use the volatile keyword, then the reads and writes will be volatile. (in this thought experiment, there will be lots of reads, and very few writes). So, I was examining whether an optimization was available where we would do a volatile write, but not volatile reads.
JMarsch
@Eric: I agree -- that's what I'm curious about. Today, it's just academic to me, but as I understand it, the penalty will grow as the number of cores (and hence caches to synchronize) grows. (Joe mentioned that in his book). So 4 cores is the norm today, but what kind of issues with this will we run into if/when 64 or 128 cores are the norm?
JMarsch
+1  A: 

For normal things (like memory-mapped devices), the cache-coherency protocols going on within/between the CPU/CPUs is there to ensure that different threads sharing that memory get a consistent view of things (i.e., if I change the value of a memory location in one CPU, it will be seen by other CPUs that have the memory in their caches). In this regard volatile will help to ensure that the optimizer doesn't optimize away memory accesses (which are always going through cache anyway) by, say, reading the value cached in a register. The C# documentation seems pretty clear on this. Again, the application programmer doesn't generally have to deal with cache-coherency themselves.

I highly recommend reading the freely available paper "What Every Programmer Should Know About Memory". A lot of magic goes on under the hood that mostly prevents shooting oneself in the foot.

Oliver Seiler
+4  A: 

Marking a variable as "volatile" has two effects.

1) Reads and writes have acquire and release semantics, so that reads and writes of other memory locations will not "move forwards and backwards in time" with respect to reads and writes of this memory location. (This is a simplification, but you take my point.)

2) The code generated by the jitter will not "cache" a value that seems to logically be unchanging.

Whether the former point is relevant in your scenario, I don't know; you've only described one memory location. Whether or not it is important that you have only volatile writes but not volatile reads is something that is up to you to decide.

But it seems to me that the latter point is quite relevant. If you have a spin lock on a non-volatile variable:

while(this.prop == 0) {}

the jitter is within its rights to generate this code as though you'd written

if (this.prop == 0) { while (true) {} }

Whether it actually does so or not, I don't know, but it has the right to. If what you want is for the code to actually re-check the property on each go round the loop, marking it as volatile is the right way to go.

Eric Lippert
Thank you Eric, and I agree -- that second point proves just how tricky it is to get it right if you want to go lockless. This really has been more of an academic exercise for me -- I avoid lockless. However, in hindsight, I think that this has been a really valuable conversation. I'm sometimes asked why we shouldn't/couldn't go lockless for one thing or another. I usually come back with something like "have you considered xxx". This dialog has provided me with more evidence for my case when those lockless discussions come up. Thank you all.
JMarsch
+2  A: 

At the CPU level, yes every processor will eventually see the change to the memory address. Even without locks or memory barriers. Locks and barriers would just ensure that it all happened in a relative ordering (w.r.t other instructions) such that it appeared correct to your program.

The problem isn't cache-coherency (I hope Joe Duffy's book doesn't make that mistake). The caches stay conherent - it is just that this takes time, and the processors don't bother to wait for that to happen - unless you enforce it. So instead, the processor moves on to the next instruction, which may or may not end up happening before the previous one (because each memory read/write make take a different amount of time. Ironically because of the time for the processors to agree on coherency, etc. - this causes some cachelines to be conherent faster than others (ie depending on whether the line was Modified, Exclusive, Shared, or Invalid it takes more or less work to get into the necessary state).)

So a read may appear old or from an out of date cache, but really it just happened earlier than expected (typically because of look-ahead and branch prediction). When it really was read, the cache was coherent, it has just changed since then. So the value wasn't old when you read it, but it is now when you need it. You just read it too soon. :-(

Or equivalently, it was written later than the logic of your code thought it would be written.

Or both.

Anyhow, if this was C/C++, even without locks/barriers, you would eventually get the updated values. (within a few hundred cycles typically, as memory takes about that long). In C/C++ you could use volatile (the weak non-thread volatile) to ensure that the value wasn't read from a register. (Now there's a non-coherent cache! ie the registers)

In C# I don't know enough about CLR to know how long a value could stay in a register, nor how to ensure you get a real re-read from memory. You've lost the 'weak' volatile.

I would suspect as long as the variable access doesn't completely get compiled away, you will eventually run out of registers (x86 doesn't have many to start with) and get your re-read.

But no guarantees that I see. If you could limit your volatile-read to a particular point in your code that was often, but not too often (ie start of next task in a while(things_to_do) loop) then that might be the best you can do.

tony
I agree on the cache coherency but not sure I strictly agree with you about the processor moving on regardless. Most hardware implements CPU 'snooping' whereby the CPU examines both it's store buffer and it's cache and will ignore the incorrect cached value if the store buffer is newer
zebrabox
I just mean in general that for example the CPU issues a read request then starts processing the next instruction (if there are no dependencies on the read - which is why it looks ahead and issues the reads early). And moreso for writes. Issue the write but don't wait for it to finish. The scary one is Invalidate messages -the CPU is told it's cache line is invalid and it says sure I hear you, but I'm going to temporarily ignore that if it doesn't affect me singlethreadedly. This is how the alpha can read *p 'before' reading p.
tony
Thanks Tony, and thanks for your additional comments on John's post.
JMarsch