views:

62

answers:

4

Take a boolean value. One thread is trying to assign a pre-determined value to it, and at exactly the same time another thread is trying to read it. What happens?

+2  A: 

It's a race condition: one thread will win, the other will lose. Which one is not determinable in practice.

edit

Note that "winning" is not necessarily a good thing, nor is there always a guarantee that will not be corrupted (at least in general).

Steven Sudit
In many situations, it is not that one wins and the other loses. It is possible that the reader can end up reading garbage, where only a portion of the data has changed. The behavior is simply undefined unless you are specific about the environment (processor, compiler, OS, etc.)
sbass
@sbass: That's certainly possible in the general case, but we're talking about a Boolean value stored in a byte or machine word. There's just no way for part of it to be in one state while another part isn't.
Steven Sudit
Unless you can point out an error in my reasoning, I recommend that you reverse the downvote.
Steven Sudit
My concern with your answer was the idea of a winner and loser. On some machines, you can't count on the access to even a boolean being atomic. It might be stored as an 32 bit int, on a processor where such accesses are not atomic. It's up to the compiler. I have been bitten by such errors and it is dangerous to assume anything here. I'd reverse the down vote if it bothers you and I could figure how (it's the first down vote I ever made) but the system does not seem to want to allow me to do so.
sbass
@sbass: Again, I don't dispute that this is possible in theory. In practice, it has never been the case for any environment I worked in. You're right, though, that "winning" isn't necessarily a victory; consider the thread with Reed Copsey. As for reversing the downvote, you're not obligated to do so, particularly if you stand by it. However, if you still wish to, I've touched my answer so that you'll be able to.
Steven Sudit
@Steven Sudit: What about if the value straddles a cache line boundary, or even worse, a page boundary? I don't think even x86 guarantees atomicity in that case. Of course, you'd have make an effort to shoot yourself in the foot in that particular way, but a quick reading of thedailywtf should dispel the notion that "Noone would be that crazy, right?". And, of course, a boolean in C and C++ is AFAIK without exception represented as the integer value 0 or 1, so in this case it doesn't matter if you mixup the low and high order bits. But for a general integer value I believe it holds.
janneb
@janneb: A multibyte value should be aligned based on its size, which means it will not cross a cache line or page boundary. But, yes, if you do enough things wrong, atomicity will break down.
Steven Sudit
A: 

I believe the issue is dependent on the architecture. Without taking that into account then I always think of it in terms of a single CPU executing low level instructions and the operating system having the ability to switch out which thread is doing so at any moment. If you absolutely know that one machine instruction is what it takes to write a value and one instruction is what it takes to read a value, then I suppose there's no chance of an 'intermidiate state'.

That said, I'm not sure how what I said is complicated by multiple processors. I've always wondered actually...

Gabriel
"If you absolutely know that one machine instruction is what it takes to write a value and one instruction is what it takes to read a value".. That's not always true. CPU cache lines may be out of date, as well.
Reed Copsey
Since a Boolean is normally stored in the smallest unit of memory (either a single byte or a single machine word), there's no way for part of it to have one value while another part has a different value. Therefore, the reading thread will either see the valid previous value or the valid next value, even if they're each running on their own core. If it's multithreading with a single core, it depends entirely on when a thread gets pre-empted. Either way, it's a race.
Steven Sudit
@Reed: In which case, it will read a stale value, so the reading thread will "win" the race.
Steven Sudit
@Steven: Yes - though it's misleading, since the reading thread may "win" the race for a period of time, and then refresh its cache line and read a different value long after the writing thread updated...
Reed Copsey
Are you basically saying that one instruction doesn't always take the same number of ticks on the CPU's clock, and the thread can switch at the granularity of a single tick? I'm just curious - I'm sure you know more about it than I do.
Gabriel
@Reed: Yes, that's why I put the scare-quotes around "win". It's not much of a victory.
Steven Sudit
@Gabriel: In practice, a CPU can assert a lock on the shared memory bus, causing the others to experience an extra wait state that burns a cycle. So, yes, a load or store could take longer. Then add the complication of caches, where the read could be satisfied by a cache hit so it doesn't even need to go to the bus, at least until its cache is invalidated.
Steven Sudit
+3  A: 

Take a boolean value. One thread is trying to assign a pre-determined value to it, and at exactly the same time another thread is trying to read it. What happens?

This depends on the architecture and language in question. In practice, this typically means you'll have a race condition, and the "read" value may be the old or new value.

It's more interesting if two threads to try write to the same variable at the same time - if one writes false and the other true, the actual resulting variable may end up with either value.

In order to guarantee proper access, a memory barrier needs to be in place. This is especially true if dealing with multiple processors or even multiple cores, as the CPU cache lines need to be invalidated after a thread writes in order for a separate thread to read a value. Most languages have support for this in one form or another - for example, in C#, you can flag a variable as volatile in order to assist in this, but explicit synchronization (ie: locking) is typically still required.

Also, if the variable write is not a single CPU instruction (ie: using an interlocked operation or similar), care must be taken to explicitly synchronize the access for read and write - otherwise, you can (on some architectures) get the variable in a third, indeterminate state.

Reed Copsey
This is a comprehensive answer.
Steven Sudit
So long as it doesn't crash I'm happy. Thanks.
Artfunkel
Memory barriers are not needed to invalidate cache lines; cache coherency protocols work just fine in the absence of barrier instructions. Rather, what barriers are used for is preventing the processor from reordering memory operations (which can include things like waiting for a cache coherency transaction to complete). See e.g. www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf
janneb
@janneb: You're right. Cache coherency ensures that the CPU's will eventually agree about what the contents of memory are, but don't avoid race conditions. Barriers can ensure that this mutual agreement is reached before taking action, which avoids the problems caused by reordering. If anything, reordering may be a bigger threat to stability than simple race conditions.
Steven Sudit
We have to be careful how we define what a memory barrier does. I'm not so sure Reed's statement was incorrect. Remember, there are two memory models in play: hardware the CLR. It *may* be possible that some implementations do model reads and writes using caching techniques that require invalidation. Just thinking out loud here.
Brian Gideon
@Brian: I think it's more fruitful to think of it in a slightly different way: The CPU architecture documentation specifies a memory consistency model, and then it's up to the CPU/memory/interconnect designers to come up with a system that fulfills the guarantees given by the model (replace CPU with CLR/JVM/etc if you wish). Now, is it possible to have a platform architecture that requires the programmer to explicitly invalidate cache lines? Sure, but I'm not aware of any.
janneb
Back in the day, there used to be supercomputer architectures with non-cache-coherent shared memory, but AFAIK you couldn't run general multi-threaded software on them. Instead non-local memory was accessed via some SHMEM style API.
janneb
@janneb: I'm totally with you that. And my example of the .NET CLR isn't a very good one anyway, because I'm pretty sure the specification spells out the memory model in terms of instruction ordering so obviously the rules of memory barriers are similiar (despite being weaker) to what they are at the hardware level. Like you, I'm not aware of any counter examples.
Brian Gideon
+1  A: 

Nothing to add re what happens. Note however that typically there are hardware-level mechanisms to guarantee atomic access to memory at a certain address, of a certain length, in this scenario. For small values, this is more efficient than a user-mode lock like CRITICAL_SECTION.

Check out .Net framework Interlocked class or the Interlocked* APIs in Win32.

Steve Townsend
Yes, on the Intel, this works by using a LOCK prefix (although it can be implicit).
Steven Sudit