views:

414

answers:

4

5, 100, 1000?

I guess, "it depends", but on what?

What is common in applications that run as server daemons / services?

What are hard limits?

Given that the machine can handle the overall workload, how do I determine at how many threads the overhead starts to have an impact on performance?

What are important differences between OS's?

What else should be considered?

I'm asking because I would like to employ threads in an application to organize subcomponents of my application that do not share data and are designed to do their work in parallel. As the application would also use thread pools for parallelizing some tasks, I was wondering at what point I should start to think about the number of threads that's going to run in total.

I know the n+1 rule as a guideline for determining the number of threads that simultaneously work on the same task to gain performance. However, I want to use threads like one might use processes in a larger scope, i. e. to organize independent tasks that should not interfere with each other.

In this related question, some people advise to minimise the number of threads because of the added complexity. To me it seems that threads can also help to keep things sorted more orderly and actually reduce interference. Isn't that correct?

+4  A: 

I can't answer your question about "how much is many" but I agree that you should not use threads for every task possible.

The optimal amount of threads for performance of application is (n+1), where n is the amount of processors/cores your computer/claster has.

The more your actual thread amount differs from n+1, the less optimal it gets and gets your system resources wasted on thread calculations.

So usually you use 1 thread for the UI, 1 thread for some generic tasks, and (n+1) threads for some huge-calculation tasks.

Max
Well, I know the n+1 rule as a guideline for determining the number of threads that simultaneously work on the same task to gain performance. However, I want to use threads like one might use processes in a larger scope, i. e. to organize independent tasks that should not interfere with each other.
That depends on what kind of tasks they are. But in 99% of cases, you can implement it with 1-2 threads. For example,if you make some gameserver,where are 1k+ NPC's walking on streets,you usually loop through queues. So you get a queue of NPC's, and when a thread is freed, it takes 1 NPC from queue.
Max
The optimal number of threads varies *widely* depending on what the application is doing. If it's heavily IO bound then the optimal number of threads may be significantly higher than the number of cores. Similarly your application may have a number of threads in a sleep/wait state.
Greg Beech
Yes, the n+1 rule is *optimal* for uber-calculation tasks, where the task depends only on calculation speed, with no other factors at all.
Max
A: 

Microsoft's ThreadPool class limits you to 25 threads per processor. The limit is based on context switching between threads and the memory consumed by each thread. So, that's a good guideline if you're on the Windows platform.

Anthony Mastrean
+1  A: 

Actually Ajmastrean is a little out of date. Quoting from his own link

The thread pool has a default size of 250 worker threads per available processor, and 1000 I/O completion threads. The number of threads in the thread pool can be changed by using the SetMaxThreads method.

But generally I think 25 is really where the law of diminishing returns (and programmers abilities to keep track of what is going on) starts coming into effect. Although Max is right, as long as all of the threads are performing non-blocking calculations n+1 is the optimal number, in the real world most of the threading tasks I perform tend to be done on stuff with some kind of IO.

Kris Erickson
I'm not out of date, Microsoft is waaay ahead. Here in Big-Corporation, we still use the .NET 2.0 Framework.
Anthony Mastrean
Actually the thread pool size was changed in .NET 2.0. We are still stuck in that framework too.
Kris Erickson
+1  A: 

Also depends on your architecture. E.g. in NVIDIA GPGPU lib CUDA you can put on an 8 thread multiprocessor 512 threads simoultanously. You may ask why assign each of the scalar processors 64 threads? The answer is easy: If the computation is not compute bound but memory IO bound, you can hide the mem latencies by executing other threads. Similar applies to normal CPUs. I can remember that a recommendation for the parallel option for make "-j" is to use approx 1.5 times the number of cores you got. Many of the compiling tasks are heavy IO burden and if a task has to wait for harddisk, mem ... whatever, CPU could work on a different thread.

Next you have to consider, how expensive a task/thread switch is. E.g. it is comes free, while CPU has to perform some work for a context switch. So in general you have to estimate if the penalty for two task switches is longer than the time the thread would block (which depends heavily on your applications).

flolo