views:

71

answers:

4

Me and a friend were debating whether the number of threads in a threadpool should be process count + 1 or just process count.

I chose just processor count because there is a even number of threads that can be distributed to each processor, and he chose processor count + 1 because he thinks it'll help him optimize performance

So my question is, who is correct?

+3  A: 

Neither of you.

The correct answer is: you need a number of threads that produces the best overall result.

Each thread you add has a cost. It's a small cost but it's definitely a cost. Put 30,000 threads in a thread pool and watch your system grind to a halt (if they're all doing something).

But each thread notionally saves you some amount of (overall) time.

Conceptually you can plot that relationship and produce a range of threads that will give you the desired result given certain resource constraints. The correct number of threads is somewhere in that range.

Note: I said "overall result". That's not necessarily the "fastest result". One thread may do a task in 10 minutes. 100 threads may do it in 15 seconds. 1000 threads may do it in 10 seconds because you're starting to hit contention limits for your particular problem. Is that a better overall result? There might be no real difference between 10 and 15 seconds but 1000 threads may use way more memory.

Remember that not all threads are CPU bound so it makes perfect sense in a lot of situations to have way more threads than the number of cores because at some point in performing a task a thread may sleep while it waits for something to happen (network communication, disk read, whatever).

cletus
+1  A: 

Neither.

The kernel of the operating system (unless explicitly instructed via affinity settings) can and will migrate the process on the available cores in your system.

If you take a look at System.Threading.ThreadPool (MSDN), Microsoft creates 250 threads per available core, by default. It is "cheaper" to reuse an idle thread than to create/destroy (rinse and repeat) threads.

Michael
A: 

The answer is 'it depends'.

You are doing pure computational math on an idle system, then the answer is 'in theory' one thread per core. In reality systems are 'never' idle and a small amount of oversubscription will help.

The precise amount of oversubcription 'depends' on how often a thread is 'computing' vs. blocked waiting for i/o (network, file, user input).

If you are doing network or file I/O adding a small amount of threads will help because the requests can be queued and submitted together, if you add too many threads then you will just flood the I/O system or controller and effectively end up with a lock convey.

.NET 4 has added a hill climbing algorithm to help detect how many threads are needed in the threadpool to perform optimally.

Similarly Windows7 add User Mode Scheduled Threads which provide a call black notification when blocking occurs so that this can be done explicitly (and the Concurrency Runtime takes advantage of this).

If you're just looking for a quick answer then try 2N - 1 for non compute bound workloads where N is small < 8.

-Rick

Rick
What is the justification for this 2N - 1 rule? It seems very irrational.
Tronic
+1  A: 

If all your threads are CPU-bound and not doing I/O, you will usually get best results with exactly the number of CPUs. Running more than the number of CPUs causes frequent context switching (which slows things down) and running less leaves some CPUs unused.

If the threads use a lot of RAM, you may need less than the number of CPUs to avoid going into swap. This often happens when compiling C++ code in parallel, as GCC (that is otherwise heavily CPU-bound) uses a lot of RAM for template instantiation.

If the threads do blocking I/O (usually to disk, but could be network or other external resource as well), you may need more than the number of CPUs, to keep the CPUs fully occupied, but on the other hand you may want less than that to avoid the I/O choking (mechanical HDDs in particular slow down when they need to read/write multiple locations at the same time).

If there are any real-time requirements on the system (e.g. games or video players), you should keep one CPU mostly idle (to allow frequent context switching and interactivity on that).

So, as others said, there is no simple answer. In all examples above I assumed that there are no other programs running on the system, using any significant amount of CPU (which surprisingly is most often the case). If there are other CPU users, consider that also in the calculations (e.g. by reducing the available CPU count).

The only useful "quick rules" for largely CPU-bound situations are then N-1, N and N+1 (where N is the number of available CPUs), depending on the factors mentioned. For heavily disk I/O bound situations only one thread (per HDD) should be used.

Tronic