views:

2196

answers:

7

Once upon a time, to write x86 assembler, for example, you would have instructions stating "load the EDX register with the value 5", "increment the EDX" register, etc.

With modern CPUs that have 4 cores (or even more), at the machine code level does it just look like there are 4 separate CPUs (i.e. are there just 4 distinct "EDX" registers) ? If so, when you say "increment the EDX register", what determines which CPU's EDX register is incremented? Is there a "CPU context" or "thread" concept in x86 assembler now?

How does communication/synchronization between the cores work?

If you were writing an operating system, what mechanism is exposed via hardware to allow you to schedule execution on different cores? Is it some special priviledged instruction(s)?

If you were writing an optimizing compiler/bytecode VM for a multicore CPU, what would you need to know specifically about, say, x86 to make it generate code that runs efficiently across all the cores?

You could summarize my question as "What changes have been made to x86 machine code to support multi-core functionality?"

Apologies that this question isn't very clear.

+1  A: 

The assembly code will translate into machine code that will be executed on one core. If you want it to be multithreaded you will have to use operating system primitives to start this code on different processors several times or different pieces of code on different cores - each core will execute a separate thread. Each thread will only see one core it is currently executing on.

sharptooth
I was going to say something like this, but then how does the OS allocate threads to cores? I imagine there are some privileged assembly instructions which accomplish this. If so, I think that is the answer the author is looking for.
A. Levy
There's no instruction for that, that's the duty of operating system scheduler. There are operating system functions like SetThreadAffinityMask in Win32 and the code can call them, but it's operating system stuff and affects the scheduler, it's not a processor instruction.
sharptooth
There must be an OpCode or else the operating system wouldn't be able to do it either.
Matthew Whited
Not really an opcode for scheduling - it's more like you get one copy of the OS per processor, sharing a memory space; whenever a core re-enters the kernel (syscall or interrupt), it looks at the same data structures in memory to decide what thread to run next.
pjc50
+8  A: 

As I understand it, each "core" is a complete processor, with its own register set. Basically, the BIOS starts you off with one core running, and then the operating system can "start" other cores by initializing them and pointing them at the code to run, etc.

Synchronization is done by the OS. Generally, each processor is running a different process for the OS, so the multi-threading functionality of the operating system is in charge of deciding which process gets to touch which memory, and what to do in the case of a memory collision.

Nicholas Flynt
which does beg the question though: What instructions are available to the operating system to do this?
Paul Hollingsworth
There's a set of priviledged instructions for that, but it's the problem of operating system, not the application code. If application code wants to be multithreaded it has to call operating system functions to do the "magic".
sharptooth
+5  A: 

Each Core executes from a different memory area. Your operating system will point a core at your program and the core will execute your program. Your program will not be aware that there are more than one core or on which core it is executing.

There is also no additional instruction only available to the Operating System. These cores are identical to single core chips. Each Core runs a part of the Operating System that will handle communication to common memory areas used for information interchange to find the next memory area to execute.

This is a simplification but it gives you the basic idea of how it is done. More about multicores and multiprocessors on Embedded.com has lots of information about this topic ... This topic get complicated very quickly!

Gerhard
+4  A: 

If you were writing an optimizing compiler/bytecode VM for a multicore CPU, what would you need to know specifically about, say, x86 to make it generate code that runs efficiently across all the cores?

As someone who writes optimizing compiler/bytecode VMs I may be able to help you here.

You do not need to know anything specifically about x86 to make it generate code that runs efficiently across all the cores.

However, you may need to know about cmpxchg and friends in order to write code that runs correctly across all the cores. Multicore programming requires the use of synchronisation and communication between threads of execution.

You may need to know something about x86 to make it generate code that runs efficiently on x86 in general.

There are other things it would be useful for you to learn:

You should learn about the facilities the OS (Linux or Windows or OSX) provides to allow you to run multiple threads. You should learn about parallelization APIs such as OpenMP and Threading Building Blocks, or OSX 10.6 "Snow Leopard"'s forthcoming "Grand Central".

You should consider if your compiler should be auto-parallelising, or if the author of the applications compiled by your compiler needs to add special syntax or API calls into his program to take advantage of the multiple cores.

Alex Brown
<pedantic>Snow Leopard is OS 10.6, not 10.7</pedantic> ;-)
A. Levy
fixed, thanks A.
Alex Brown
Don't have several popular VMs like .NET and Java have a problem that their main GC process is covered in locks and fundamentally singlethreaded?
Marco van de Voort
+16  A: 

This isn't a direct answer to the question, but it's an answer to a question that appears in the comments. Essentially, the question is what support the hardware gives to multi-threaded operation.

Nicholas Flynt had it right, at least regarding x86. In a multi threaded environment (Hyper-threading, multi-core or multi-processor), the Bootstrap thread (usually thread 0 in core 0 in processor 0) starts up fetching code from address 0xfffffff0. All the other threads start up in a special sleep state called Wait-for-SIPI. As part of its initialization, the primary thread sends a special inter-processor-interrupt (IPI) over the APIC called a SIPI (Startup IPI) to each thread that is in WFS. The SIPI contains the address from which that thread should start fetching code.

This mechanism allows each thread to execute code from a different address. All that's needed is software support for each thread to set up its own tables and messaging queues. The OS uses those to do the actual multi-threaded scheduling.

As far as the actual assembly is concerned, as Nicholas wrote, there's no difference between the assemblies for a single threaded or multi threaded application. Each logical thread has its own register set, so writing:

mov edx, 0

will only update EDX for the currently running thread. There's no way to modify EDX on another processor using a single assembly instruction. You need some sort of system call to ask the OS to tell another thread to run code that will update its own EDX.

Nathan Fellman
Thanks for filling the gap in Nicholas' answer. Have marked yours as the accepted answer now.... gives the specific details I was interested in... although it would be better if there was a single answer that had your information and Nicholas' all combined.
Paul Hollingsworth
A: 

What has been added on every multiprocessing-capable architecture compared to the single-processor variants that came before them are instructions to synchronize between cores. Also, you have instructions to deal with cache coherency, flushing buffers, and similar low-level operations an OS has to deal with. In the case of simultaneous multithreaded architectures like IBM POWER6, IBM Cell, Sun Niagara, and Intel "Hyperthreading", you also tend to see new instructions to prioritize between threads (like setting priorities and explicitly yielding the processor when there is nothing to do).

But the basic single-thread semantics are the same, you just add extra facilities to handle synchronization and communication with other cores.

jakobengblom2
A: 

It's not done in machine instructions at all; the cores pretend to be distinct CPUs and don't have any special capabilities for talking to one another. There are two ways they communicate:

  • they share the physical address space. The hardware handles cache coherency, so one CPU writes to a memory address which another reads.

  • they share an APIC (programmable interrupt controller). This is memory mapped into the physical address space, and can be used by one processor to control the others, turn them on or off, send interrupts, etc.

http://www.cheesecake.org/sac/smp.html is a good reference with a silly url.

pjc50
They don't in fact share an APIC. Each logical CPU has its own one. The APICs communicate between themselves, but they are separate.
Nathan Fellman