views:

148

answers:

2

In a Windows operating system with 2 physical x86/amd64 processors (P0 + P1), running 2 processes (A + B), each with two threads (T0 + T1), is it possible (or even common) to see the following:

P0:A:T0 running at the same time as P1:B:T0

then, after 1 (or is that 2?) context switch(es?)

P0:B:T1 running at the same time as P1:A:T1

In a nutshell, I'd like to know if - on a multiple processor machine - the operating system is free to schedule any thread from any process at any time, regardless of what other threads from other processes are already running.

EDIT: To clarify the silly example, imagine that process A's thread A:T0 has affinity to processor P0 (and A:T1 to P1,) while process B's thread B:T0 has affinity to processor P1 (and B:T1 to to P0). It probably doesn't matter whether these processors are cores or sockets.

Is there a first-class concept of a process context switch? Perfmon shows context switches under the Thread object, but nothing under the Process object.

+3  A: 

Yes, it is possible and it happens pretty often.
The OS tries to not switch one thread between CPUs (you can make it try harder setting the threads preferred processor, or you can even lock it to single processor via affinity).
Windows' process is not an execution unit by itself - from this viewpoint, its basically just a context for its threads.

EDIT (further clarifications)

There's nothing like a "process context switch". Basically, the OS scheduler assigns the threads via a (very adaptive) round-robin algorithm to any free processor/core (as the affinity allows), if the "previous" processor isn't immediately available, regardless of the processes (which means multi-threaded processes can steal much more CPU power).

This "jumping" may seem expensive, considering at least the L1 (and sometimes L2) caches are per-core (apart from different slot/package processors), but it's still cheaper than delays caused by waiting to the "right" processor and inability to do elaborate load-balancing (which the "jumping" scheme makes possible).
This may not apply to the NUMA architecture, but there are much more considerations invoved (e.g. adapting all memory-allocations to be thread- and processor-bound and avoiding as much state/memory sharing as possible).

As for affinity: you can set affinity masks per-thread or per-process (which supersedes all process' threads' settings), but the OS enforces least one logical processor affiliated per thread (you never end up with a zero mask).

A process' default affinity mask is inherited from its parent process (which allows you to create single-core loaders for problematic legacy executables), and threads inherit the mask from the process they belong to.

You may not set a threads affinity to a processor outside the process' affinity, but you can further limit it.

Any thread by default, will jump between the available logical processors (especially if it yields, calls to kernel, etc), may jump even if it has its preferred processor set, but only if it has to, but it will NOT jump to a processor outside its affinity mask (which may lead to considerable delays).

I'm not sure if the scheduler sees any difference between physical and hyper-threaded processors, but even if it doesn't (which I assume), the consequences are in most cases not of a concern, i.e. there should not be much difference between multiple threads sharing physical or logical processors if the thread count is just the same. Regardless, there are some reports of cache-thrashing in this scenario, mainly in high-performance heavily multithreaded applications, like SQL server or .NET and Java VMs, which may or may not benefit from HyperThreading turned off.

Viktor Svub
I'm fairly certain it's a bit more complex than that. All threads of a single process share the same virtual address space, which (except for shared memory) does not overlap that of another process. That also means it's good for an OS to keep threads of a single process on a single core - less context to reload.
MSalters
@MSalters: I do agree that there must be some additional cost in switching context when the threads are from different processes, but don't agree it's automatically good for an OS to try keep the threads on single core, especially as the number of available processors increases. It would be interesting to quantify the additional cost of changing from one virtual address space to another, over and above the cost of switching threads.
Jono
Thanks for the edit, Victor. Very informative. I doubt there are very many concurrency questions that can't be caveated with "actually, it's slightly more complex"... but a good approximation of the truth - like this - will point me in the right direction.
Jono
+1  A: 

I generally agree with the previous answer, however things are more complex.

Although processes are not execution units, threads belonging to the same process should be treated differently. There're two reasons for this:

  1. Same address space. Means - when switching the context between such threads no need to setup the address translation registers.
  2. Threads of the same process are much more likely to access the same memory.

The (2) has a great impact on the cache state. If threads read the same memory location - they reuse the L2 cache, hence the whole things speeds up. There's however the drawback too: once a thread changes a memory location - that address is invalidated in both L2 cache and L2 cache of both processors, so that the other processor invalidates its cache too.

So there're pros and cons with running the threads of the same process simultaneously (on different processors). BTW this situation has a name: "Gang scheduling".

valdo