views:

1468

answers:

7

I can understand how one can write a program that uses multiple processes or threads: fork() a new process and use IPC, or create multiple threads and use those sorts of communication mechanisms.

I also understand context switching. That is, with only once CPU, the operating system schedules time for each process (and there are tons of scheduling algorithms out there) and thereby we achieve running multiple processes simultaneously.

And now that we have multi-core processors (or multi-processor computers), we could have two processes running simultaneously on two separate cores.

My question is about the last scenario: how does the kernel control which core a process runs on? Which system calls (in Linux, or even Windows) schedule a process on a specific core?

The reason I'm asking: I'm working on a project for school where we are to explore a recent topic in computing - and I chose multi-core architectures. There seems to be a lot of material on how to program in that kind of environment (how to watch for deadlock or race conditions) but not much on controlling the individual cores themselves. I would love to be able to write a few demonstration programs and present some assembly instructions or C code to the effect of "See, I am running an infinite loop on the 2nd core, look at the spike in CPU utilization for that specific core".

Any code examples? Or tutorials?

edit: For clarification - many people have said that this is the purpose of the OS, and that one should let the OS take care of this. I completely agree! But then what I'm asking (or trying to get a feel for) is what the operating system actually does to do this. Not the scheduling algorithm, but more "once a core is chosen, what instructions must be executed to have that core start fetching instructions?"

A: 

The OS knows how to do this, you do not have to. You could run into all sorts of issues if you specified which core to run on, some of which could actually slow the process down. Let the OS figure it out, you just need to start the new thread.

For example, if you told a process to start on core x, but core x was already under a heavy load, you would be worse off than if you had just let the OS handle it.

Ed Swangren
+15  A: 

Normally the decision about which core an app will run on is made by the system. However, you can set the "affinity" for an application to a specific core to tell the OS to only run the app on that core. Normally this isn't a good idea, but there are some rare cases where it might make sense.

To do this in windows, use task manager, right click on the process, and choose "Set Affinity". You can do it programmatically in Windows using functions like SetThreadAffinityMask, SetProcessAffinityMask or SetThreadIdealProcessor.

ETA:

If you are interested in how the OS actually does the scheduling, you might want to check out these links:

Wikipedia article on context switching

Wikipedia article on scheduling

Scheduling in the linux kernel

With most modern OS's, the OS schedules a thread to execute on a core for a short slice of time. When the time slice expires, or the thread does an IO operation that causes it to voluntarily yield the core, the OS will schedule another thread to run on the core (if there are any threads ready to run). Exactly which thread is scheduled depends on the OS's scheduling algorithm.

The implementation details of exactly how the context switch occurs are CPU & OS dependent. It generally will involve a switch to kernel mode, the OS saving the state of the previous thread, loading the state of the new thread, then switching back to user mode and resuming the newly loaded thread. The context switching article I linked to above has a bit more detail about this.

Eric Petroelje
+1 for providing the easy way to do it.
Ed Swangren
Note that the affinity mask is inherited by child processes, so if you set it on Explorer, all launched applications will also be using a subset of the available processors.
Richard
+1  A: 

I don't know the assembly instructions. But the windows API function is SetProcessAffinityMask. You can see an example of something I cobbled together a while ago to run Picasa on only one core

Will Rickards
+1  A: 

As others have mentioned, it's controlled by the operating system. Depending on the OS, it may or may not provide you with system calls that allow you to affect what core a given process executes on. However, you should usually just let the OS do the default behavior. If you have a 4-core system with 37 processes running, and 34 of those processes are sleeping, it's going to schedule the remaining 3 active processes onto separate cores.

You'll likely only see a speed boost on playing with core affinities in very specialized multithreaded applications. For example, suppose you have a system with 2 dual-core processors. Suppose you have an application with 3 threads, and two of threads operate heavily on the same set of data, whereas the third thread uses a different set of data. In this case, you would benefit the most by having the two threads which interact on the same processor and the third thread on the other processor, since then they can share a cache. The OS has no idea what memory each thread needs to access, so it may not allocate threads to cores appropriately.

If you're interested in how the operating system, read up on scheduling. The nitty gritty details of multiprocessing on x86 can be found in the Intel 64 and IA-32 Architectures Software Developer's Manuals. Volume 3A, Chapters 7 and 8 contain relevant information, but bear in mind these manuals are extremely technical.

Adam Rosenfield
+9  A: 

As others have mentioned, processor affinity is Operating System specific. If you want to do this outside the confines of the operating system, you're in for a lot of fun, and by that I mean pain.

That said, others have mentioned SetProcessAffinityMask for Win32. Nobody has mentioned the Linux kernel way to set processor affinity, and so I shall. You need to use the sched_set_affinity function. Here's a nice tutorial on how.

Randolpho
I wrote an article on this topic a while back, but it's been written in Slovak, so I guess that wouldn't help the person asking :) Anyways, your answer goes in the right direction, so I am deffo giving you a vote up :-)
Peter Perháč
+2  A: 

The OpenMPI project has a library to set the processor affinity on Linux in a portable way.

Some while back, I have used this in a project and it worked fine.

Caveat: I dimly remember that there were some issues in finding out how the operating system numbers the cores. I used this in a 2 Xeon CPU system with 4 cores each.

A look at cat /proc/cpuinfo might help. On the box I used, it is pretty weird. Boiled down output is at the end.

Evidently, the evenly numbered cores are on the first cpu and the oddly numbered cores are on the second cpu. However, if I remember correctly, there was an issue with the caches. On these Intel Xeon processors, two cores on each CPU share their L2 caches (I do not remember whether the processor has an L3 cache). I think that the virtual processors 0 and 2 shared one L2 cache, 1 and 3 shared one, 4 and 6 shared one and 5 and 7 shared one.

Because of this weirdness (1.5 years back I could not find any documentation on the process numbering in Linux), I would be careful do do this kind of low level tuning. However, there clearly are some uses. If your code runs on few kinds of machines then it might be worth to do this kind of tuning. Another application would be in some domain specific language like StreamIt where the compiler could do this dirty work and compute a smart schedule.

processor       : 0
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 1
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 2
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 3
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 4
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 5
physical id     : 1
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 6
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

processor       : 7
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4
Manuel
+2  A: 

Nothing tells core "now start running this process".

The core does not see process, it only knows about executable code and various running levels and associated limitations to instructions that can be executed.

When computer boots, for sake of simplicity only one core/processor is active and actually runs any code. Then if OS is MultiProcessor capable, it activates other cores with some system specific instruction, other cores most likely pick up from exactly same spot as other core and run from there.

So what scheduler does is it looks through OS internal structures (task/process/thread queue) and picks one and marks it as running at its core. Then other scheduler instances running on other cores won't touch it until the task is in waiting state again (and not marked as pinned to specific core). After task is marked as running, scheduler executes switch to userland with task resuming at the point it was previously suspended.

Technically there is nothing whatsoever stopping cores from running exact same code at exact same time (and many unlocked functions do), but unless code is written to expect that, it will probably piss all over itself.

Scenario goes weirder with more exotic memory models (above assumes "usual" linear single working memory space) where cores don't necessarily all see same memory and there may be requirements on fetching code from other core's clutches, but it's much easier handled by simply keeping task pinned to core (AFAIK Sony PS3 architecture with SPU's is like that).

Pasi Savolainen