views:

107

answers:

4

As per the title, plus what are the limitations and gotchas.

For example, on x86 processors, alignment for most data types is optional - an optimisation rather than a requirement. That means that a pointer may be stored at an unaligned address, which in turn means that pointer might be split over a cache page boundary.

Obviously this could be done if you work hard enough on any processor (picking out particular bytes etc), but not in a way where you'd still expect the write operation to be indivisible.

I seriously doubt that a multicore processor can ensure that other cores can guarantee a consistent all-before or all-after view of a written pointer in this unaligned-write-crossing-a-page-boundary situation.

Am I right? And are there any similar gotchas I haven't thought of?

+2  A: 

The very notion of a single memory visible to all threads ceases to work with several cores having individual caches. StackOverflow questions on memory barriers may be of interest; say, this one.

I think an example to illustrate the problem with the "single memory" model is this one: Initially, x = y = 0.

Thread 1:

X = x;
y = 1;

Thread 2:

Y = y;
x = 1;

Of course, there is a race condition. The secondary problem besides the obvious race condition is that one possible outcome is X=1, Y=1. Even without compiler optimizations (even if you write the above two threads in assembly).

Pascal Cuoq
Very interesting - I hadn't encountered the term "memory barrier" until now.
Steve314
Nice example - the point might be clearer, though, if you indicate that all four variables have some other value at the start. I assume the point relates to out-of-order execution?
Steve314
Well, I mentioned that `x = y = 0`. X and Y are both assigned without ever being read, so their initial values are not really relevant, but you could assume they are nil too if you like. To start on memory barriers, a good place is http://en.wikipedia.org/wiki/Memory_barrier , as always. And yes, it's all the fault of out-of-order execution.
Pascal Cuoq
I'm just being a bit daft really - my first thought was basically "so maybe you started with X = Y = 1, so what?".
Steve314
A: 

Is it possible for this class to output “False”?

class Unsafe
{
  static bool underwearOn, trousersOn;

  static void Main( )
  {
    new Thread(Wait).Start( );    // Start up the busy waiter
    Thread.Sleep(1000);           // Give it a second to start up!

    underwearOn = true;
    trousersOn = true;
  }

  static void Wait(  )
  {
    while (!trousersOn)
        ; // Spin until trousersOn
    Console.Write(underwearOn);
  }
}

Yes, on multicore machines. Value types, such as bools, can be stored in machine registers, and the order registers are synchronized is machine-specific. underwearOncould be synchronized before trousersOn.

You could lock the assignments and while loop, but this will harm performance. A better solution is to declare the bool variables volatile. Such variables are not stored in registers.

Edit:

This is a simplified example from a presentation available at Threading Complete.

Dour High Arch
I'm confused by the claim that "the order registers are synchronized is machine-specific". If these statics are in registers, surely that's because the compiler chose to put them there, meaning the compiler also chose when to write back the changes? ie the synchronisation order is compiler-specific?
Steve314
No, synchronizing registers across cores is determined by the hardware, not the compiler. The compiler doesn't know if object files will be running on a multicore machine or not.
Dour High Arch
But the registers are local to a particular core. They only get "synchronised" when a machine code instruction is used to write the register out to main memory. That instruction (like the one that read the value into the register in the first place) was generated by the compiler.
Steve314
The instructions (CMP or whatever) are generated by the compiler. Memory and registers are modified by the CPU. Behavior can be different for single or multi-core CPUs, but the compiler won't know what kind of CPU will be executing the compare instruction.
Dour High Arch
The static variables are stored in main memory, not in core-local registers - at least until the compiler decides to copy them into registers in order to handle a particular piece of code. Irrespective of whether the code runs single-core or multi-core, code generated by the compiler moves the variables into and out of the registers. The hardware executes that code, but you can say that about anything. As for the context switch with multithreading on a single core - registers get saved, but no "register synchronisation" between threads occurs. The register values remain local to the thread.
Steve314
Indeed they do remain local to the thread. When thread 1 sets underwearOn = true, it remains local to thread 1, thread 2's underwearOn remains false. The same for trousersOn, until at some indeterminate time in the future, the CPU synchronizes registers across threads. The order this is done is not defined, it could synchronize trousersOn first. Then thread 2 could read trousersOn, see it's true, and read the unsynchronized underwearOn. It's false.
Dour High Arch
Also, it is not true that "variables are stored in main memory". The compiler is perfectly free to store them only in registers, not RAM. Optimizing compilers often do this for variables with high locality whose addresses are never used.
Dour High Arch
http://stackoverflow.com/questions/2384578/what-are-cpu-registers-and-how-are-they-used-particularly-wrt-multithreading/2384609#2384609
Steve314
+1  A: 

Maybe I misunderstand the example but the "unaligned pointer" problem is the same as on a single-core execution. If a datum can be partially written to memory then different threads can see partial updates (if there's no appropriate locking) on any machine with preemtive multitasking (even on a single-CPU system).

You don't have to worry about the cache unless you are writing drivers for DMA-capable peripherals. Modern multi-processors are cache coherent so the hardware guarantees that a thread on processor A will have the same view of memory as a thread on processor B. If the thread on A reads a memory location that is cached on B then the thread on A will get the correct value from Bs cache.

You do have to worry about values in registers and from a programming standpoint that difference may not be a visible one, but in my opinion involving the cache in a concurrency discussion often just introduces unnecessary confusion.

Any operation that is labeled "indivisible" by the programming manual for a ISA must reasonably keep being indivisible in a multiprocessing system built with processors using that ISA or backwards compatibility would break. However, this does not mean that operations that were never promised to be indivisible, but happened to be in a particular processor implementation, will be indivisible in future implementations (such as in a multiprocessor system).

[Edit] Completion to the comment below

  1. Anything written to memory will be coherently visible to all threads, regardless of the number of cores (in a cache coherent system).
  2. Anything written to memory non-atomically can end up being partially read by unsynchronized threads in the presence of preemption (even on a single-core system).

If the pointer is written to an unaligned address in a single, atomic write then the cache coherence hardware will make sure that all threads see it completed, or not at all. If the pointer is written non-atomically (such as with two separate write operations) then any threads may see the partial update even on a single-core system with true preemption.

Per Ekman
I always thought that the O/S would force the cache to flush on a context switch, making unaligned operations safe for single-processor multi-threading. Is that not true?
Steve314
Not generally, no. That would be quite wasteful, especially forthreading (as opposed to "real" processes) that share an addressspace. I think the question can be answered in two parts. See above (Iran out of comment space).
Per Ekman
+1  A: 

On a x86 then the answer is yes if the assembler operation was prefixed by a lock instruction then the processor asserts a hardware signal that ensures that the following instruction is atomic (in some processors the caches coordinate to ensure the operation is atomic).

Making operations atomic is something compilers don't do, on multiprocessor systems atomic assembly language operations are very expensive and are generally used to implement the locking primitives offered by the OS / C library.

No purely high level language memory operations should be regarded as atomic. If you have multiple threads writing to the same shared memory location then you need to use some mutex/lock mechanism to avoid races.

Andrew Roca