If I have X cores on my machine and I start X threads. Let's assume for the sake of argument that each thread is completely separated in terms of the memory, hdd, etc it uses. Is the OS going to know to send each thread to a core or do more time slicing on one core for multiple threads. What the question boils down to, is if I have X cores and my program must do independent calculations, should I start X threads, will they each get piped to a core, or is the presumption that because I have X cores I can start X threads completely wrong? I'm thinking it is. This is with C# --
I guess this might depend on platform and OS. From my experience, with a C++ console application on Linux, using X threads on X cores is exactly the right thing if you need to squeeze out as much performance as possible from a machine. However, note that any concurrent task (including GUI) will eat out of CPU time available to your program. But on a dedicated server without GUI I had each core 99-100% used exclusively by my program.
Since C# uses native threads, I feel I can comment, even though my experience is mostly with Java (on Windows). In general, the OS will try to balance the load, so if you max out a core with a computationally intensive task on one thread, then few threads will be scheduled on that core.
I recently wrote some cpu-intensive multi-threaded code using a task framework, where the work is broken into small tasks and fed to N queues. Each queue is owned by a thread. I got roughtly linear speed up as I increased the number of threads from 1..X where X was the number of cores.
So in general, the answer is yes, you can expect the OS to do the right thing, especially as the number of threads increase and approaches the number of cores.
I'm going to say no...
The .NET team introduced the TPL to explicitly delegate thread execution to utilize multiple cores. Windows Vista didn't have much intelligence built in to support the OS delegating threads to multiple cores. I'm not suprised to see this improvement in the .NET framework(4.0) considering that Windows 7 has much improved support for multiple cores.
It would entirely depend on how much work each thread is going to do. If you were to start up 4 threads on a 4-core machine and simply run a tight loop then it is most likely going to consume 100% of total CPU time.
On the wider question of whether, given k threads and k cores, the OS will automatically schedule each thread 0->k-1 on the core 0->k-1, then this cannot be guaranteed. In general, once a thread is about to be scheduled to run, it will be allocated to the next available CPU. However, the OS will, I believe, be intelligent, and will try to reuse the same core that the thread previously ran on, given that thread local data is likely to be cached on that core. However, that said, in today's world of shared processor caches, this won't be a prerequisite for good thread scheduling.
You can influence a thread's affinity for a given core by calling the SetProcessorAffinity() method. However, I tend to shy away from doing this, because the OS is generally pretty good at getting your threads right.
CAUTION
There are some interesting issues with non-uniform memory access across multiple threads that will cause threads to block each other even where there is no locking involved.
Let's say that you have a large array of values and you want n threads to operate on them. You must ensure that each thread accesses data that is on a separate cache line to data accessed by other threads - a low-level issue that is not something that .Net programmers (but those who grew up on C++ or lower level platforms) are used to dealing with.
The problem is excellently demonstrated in this article from MSDN magazine. It makes for fascinating reading.
Typically it's up to the OS scheduler to assign the tasks to the executing cores. Let N be number of your tasks to run and X be number of execution cores.
If N < X your machine resources will not be fully employed, unless you have other tasks running. if N >= X it's OS's "best intent" to load balance the threads between all available cores. In reality you can't guarantee that all tasks will run on separate cores unless you enforce affinity on each task thread. Matter fact if you have the older OS that doesn't understand SMT processors it will get fooled and can allocate multiple tasks per single core while other cores might be idling.