ansaurus

Question

Are indivisible operations still indivisible on multiprocessor and multicore systems?

Answer 1

+2 A:

The very notion of a single memory visible to all threads ceases to work with several cores having individual caches. StackOverflow questions on memory barriers may be of interest; say, this one.

I think an example to illustrate the problem with the "single memory" model is this one: Initially, x = y = 0.

Thread 1:

X = x;
y = 1;

Thread 2:

Y = y;
x = 1;

Of course, there is a race condition. The secondary problem besides the obvious race condition is that one possible outcome is X=1, Y=1. Even without compiler optimizations (even if you write the above two threads in assembly).

Pascal Cuoq 2010-03-04 22:57:00

Very interesting - I hadn't encountered the term "memory barrier" until now.

Steve314 2010-03-04 23:05:21

Nice example - the point might be clearer, though, if you indicate that all four variables have some other value at the start. I assume the point relates to out-of-order execution?

Steve314 2010-03-04 23:08:36

Well, I mentioned that `x = y = 0`. X and Y are both assigned without ever being read, so their initial values are not really relevant, but you could assume they are nil too if you like. To start on memory barriers, a good place is http://en.wikipedia.org/wiki/Memory_barrier , as always. And yes, it's all the fault of out-of-order execution.

Pascal Cuoq 2010-03-04 23:13:49

I'm just being a bit daft really - my first thought was basically "so maybe you started with X = Y = 1, so what?".

Steve314 2010-03-04 23:26:42

Answer 2

A:

Is it possible for this class to output “False”?

class Unsafe
{
  static bool underwearOn, trousersOn;

  static void Main( )
  {
    new Thread(Wait).Start( );    // Start up the busy waiter
    Thread.Sleep(1000);           // Give it a second to start up!

    underwearOn = true;
    trousersOn = true;
  }

  static void Wait(  )
  {
    while (!trousersOn)
        ; // Spin until trousersOn
    Console.Write(underwearOn);
  }
}

Yes, on multicore machines. Value types, such as bools, can be stored in machine registers, and the order registers are synchronized is machine-specific. underwearOncould be synchronized before trousersOn.

You could lock the assignments and while loop, but this will harm performance. A better solution is to declare the bool variables volatile. Such variables are not stored in registers.

Edit:

This is a simplified example from a presentation available at Threading Complete.

Dour High Arch 2010-03-04 23:12:49

I'm confused by the claim that "the order registers are synchronized is machine-specific". If these statics are in registers, surely that's because the compiler chose to put them there, meaning the compiler also chose when to write back the changes? ie the synchronisation order is compiler-specific?

Steve314 2010-03-04 23:21:53

No, synchronizing registers across cores is determined by the hardware, not the compiler. The compiler doesn't know if object files will be running on a multicore machine or not.

Dour High Arch 2010-03-04 23:30:44

But the registers are local to a particular core. They only get "synchronised" when a machine code instruction is used to write the register out to main memory. That instruction (like the one that read the value into the register in the first place) was generated by the compiler.

Steve314 2010-03-04 23:31:45

The instructions (CMP or whatever) are generated by the compiler. Memory and registers are modified by the CPU. Behavior can be different for single or multi-core CPUs, but the compiler won't know what kind of CPU will be executing the compare instruction.

Dour High Arch 2010-03-05 00:06:23

The static variables are stored in main memory, not in core-local registers - at least until the compiler decides to copy them into registers in order to handle a particular piece of code. Irrespective of whether the code runs single-core or multi-core, code generated by the compiler moves the variables into and out of the registers. The hardware executes that code, but you can say that about anything. As for the context switch with multithreading on a single core - registers get saved, but no "register synchronisation" between threads occurs. The register values remain local to the thread.

Steve314 2010-03-05 00:27:31

Indeed they do remain local to the thread. When thread 1 sets underwearOn = true, it remains local to thread 1, thread 2's underwearOn remains false. The same for trousersOn, until at some indeterminate time in the future, the CPU synchronizes registers across threads. The order this is done is not defined, it could synchronize trousersOn first. Then thread 2 could read trousersOn, see it's true, and read the unsynchronized underwearOn. It's false.

Dour High Arch 2010-03-05 01:13:44

Also, it is not true that "variables are stored in main memory". The compiler is perfectly free to store them only in registers, not RAM. Optimizing compilers often do this for variables with high locality whose addresses are never used.

Dour High Arch 2010-03-05 01:20:53

http://stackoverflow.com/questions/2384578/what-are-cpu-registers-and-how-are-they-used-particularly-wrt-multithreading/2384609#2384609

Steve314 2010-03-05 04:45:10

Answer 3

+1 A:

Maybe I misunderstand the example but the "unaligned pointer" problem is the same as on a single-core execution. If a datum can be partially written to memory then different threads can see partial updates (if there's no appropriate locking) on any machine with preemtive multitasking (even on a single-CPU system).

You don't have to worry about the cache unless you are writing drivers for DMA-capable peripherals. Modern multi-processors are cache coherent so the hardware guarantees that a thread on processor A will have the same view of memory as a thread on processor B. If the thread on A reads a memory location that is cached on B then the thread on A will get the correct value from Bs cache.

You do have to worry about values in registers and from a programming standpoint that difference may not be a visible one, but in my opinion involving the cache in a concurrency discussion often just introduces unnecessary confusion.

Any operation that is labeled "indivisible" by the programming manual for a ISA must reasonably keep being indivisible in a multiprocessing system built with processors using that ISA or backwards compatibility would break. However, this does not mean that operations that were never promised to be indivisible, but happened to be in a particular processor implementation, will be indivisible in future implementations (such as in a multiprocessor system).

[Edit] Completion to the comment below

Anything written to memory will be coherently visible to all threads, regardless of the number of cores (in a cache coherent system).
Anything written to memory non-atomically can end up being partially read by unsynchronized threads in the presence of preemption (even on a single-core system).

If the pointer is written to an unaligned address in a single, atomic write then the cache coherence hardware will make sure that all threads see it completed, or not at all. If the pointer is written non-atomically (such as with two separate write operations) then any threads may see the partial update even on a single-core system with true preemption.

Per Ekman 2010-03-07 18:57:03

I always thought that the O/S would force the cache to flush on a context switch, making unaligned operations safe for single-processor multi-threading. Is that not true?

Steve314 2010-03-08 01:01:07

Not generally, no. That would be quite wasteful, especially forthreading (as opposed to "real" processes) that share an addressspace. I think the question can be answered in two parts. See above (Iran out of comment space).

Per Ekman 2010-03-08 08:51:40

Answer 4

+1 A:

On a x86 then the answer is yes if the assembler operation was prefixed by a lock instruction then the processor asserts a hardware signal that ensures that the following instruction is atomic (in some processors the caches coordinate to ensure the operation is atomic).

Making operations atomic is something compilers don't do, on multiprocessor systems atomic assembly language operations are very expensive and are generally used to implement the locking primitives offered by the OS / C library.

No purely high level language memory operations should be regarded as atomic. If you have multiple threads writing to the same shared memory location then you need to use some mutex/lock mechanism to avoid races.

Andrew Roca 2010-03-07 19:57:41

ansaurus

tags:

views:

answers:

Are indivisible operations still indivisible on multiprocessor and multicore systems?

related questions