views:

418

answers:

13

I have some embarrassingly-parallelizable work in a .NET 3.5 console app and I want to take advantage of hyperthreading and multi-core processors. How do I pick the best number of worker threads to utilize either of these the best on an arbitrary system? For example, if it's a dual core I will want 2 threads; quad core I will want 4 threads. What I'm ultimately after is determining the processor characteristics so I can know how many threads to create.

I'm not asking how to split up the work nor how to do threading, I'm asking how do I determine the "optimal" number of the threads on an arbitrary machine this console app will run on.

+13  A: 

I'd suggest that you don't try to determine it yourself. Use the ThreadPool and let .NET manage the threads for you.

Mark
That doesn't really help me at all. If I create 4 threads on a single-core, none hyperthreaded system then I'm wasting resources. If I create 2 threads on a quad core then I'm only utilizing half the processor.
Colin Burnett
Using the ThreadPool does come with an overhead but it also delegates the resposibility to sensible party who handles the scheduling in a sensible way. You may be able to handle memory managment better than the GarbageCollector so do you plan on implementing your own garbage collector. I wouldn't try to second guess how the ThreadPool works.
Mark
+2  A: 

The only way is a combination of data and code analysis based on performance data.

Different CPU families and speeds vs. memory speed vs other activities on the system are all going to make the tuning different.

Potentially some self-tuning is possible, but this will mean having some form of live performance tuning and self adjustment.

Richard
+6  A: 

You can use Environment.ProcessorCount if that's the only thing you're after. But usually using a ThreadPool is indeed the better option.

Joey
ProcessorCount is indeed what I'm after. Regarding ThreadPool: like my comment to Mark how does ThreadPool help me if I don't know how many threads to create?
Colin Burnett
ThreadPool determines for itself how many threads to use on a given system. Basically it uses the same property (if I remember correctly) unless you specify otherwise.
Joey
If you *know* that you will be sending work right away, you can use ThreadPool.SetMinThreads to the processor count.
Rick
A: 

It can be argued that the real way to pick the best number of threads is for the application to profile itself and adaptively change its threading behavior based on what gives the best performance.

chaos
Didn't think about that one! Thanks, I'll look at that one for my project as it's not related to the machine it's runnning on but a remote SqlServer.
caveman_dick
+1  A: 

I read something on this recently (see the accepted answer to this question for example).

The simple answer is that you let the operating system decide. It can do a far better job of deciding what's optimal than you can.

There are a number of questions on a similar theme - search for "optimal number threads" (without the quotes) gives you a couple of pages of results.

ChrisF
A: 

I wrote a simple number crunching app that used multiple threads, and found that on my Quad-core system, it completed the most work in a fixed period using 6 threads.

I think the only real way to determine is through trialling or profiling.

ck
+1  A: 

A good rule of the thumb, given that you're completely CPU-bound, is processorCount+1.

That's +1 because you will always get some tasks started/stopped/interrupted and n tasks will almost never completely fill up n processors.

Robert Munteanu
Any idea why the +1?
Colin Burnett
+1 because you will get some tasks started/stopped/interrupted and n tasks will almost never completely fill up n processors.
Robert Munteanu
Please explain the downvote - I'd like to know what I'm doing wrong. Thanks.
Robert Munteanu
i suppose there is a theory that the +1 could CAUSE the start/stopped/ context switches.
ShoeLace
My advice was taken from http://www.javaconcurrencyinpractice.com/ - a great book on concurrency. As for the +1 which could cause it - tasks do finish eventually, and to make up for that time during which a new task is prepared you use an extra thread which is efficiently scheduled by the OS.
Robert Munteanu
+1  A: 

I would say it also depends on what you are doing, if your making a server application then using all you can out of the CPU`s via either Environment.ProcessorCount or a thread pool is a good idea. But if this is running on a desktop or a machine that not dedicated to this task, you might want to leave some CPU idle so the machine "functions" for the user.

EKS
It's a console app run by the user so I want to reduce the execution time by using all "processors". ProcessorCount is indeed what I want though.
Colin Burnett
+2  A: 

The optimal number would just be the processor count. Optimally you would always have one thread running on a CPU (logical or physical) to minimise context switches and the overhead that has with it.

Whether that is the right number depends (very much as everyone has said) on what you are doing. The threadpool (if I understand it correctly) pretty much tries to use as few threads as possible but spins up another one each time a thread blocks.

The blocking is never optimal but if you are doing any form of blocking then the answer would change dramatically.

The simplest and easiest way to get good (not necessarily optimal) behaviour is to use the threadpool. In my opinion its really hard to do any better than the threadpool so thats simply the best place to start and only ever think about something else if you can demonstrate why that is not good enough.

Tollo
A: 

Or even better than the ThreadPool, use .NET 4.0 Task instances from the TPL. The Task Parallel Library is built on a foundation in the .NET 4.0 framework that will actually determine the optimal number of threads to perform the tasks as efficiently as possible for you.

jerryjvl
+2  A: 

The correct number is obviously 42.

Now on the serious note. Just use the thread pool, always.

1) If you have a lengthy processing task (ie. CPU intensive) that can be partitioned into multiple work piece meals then you should partition your task and then submit all individual work items to the ThreadPool. The thread pool will pick up work items and start churning on them in a dynamic fashion as it has self monitoring capabilities that include starting new threads as needed and can be configured at deployment by administrators according to the deployment site requirements, as opposed to pre-compute the numbers at development time. While is true that the proper partitioning size of your processing task can take into account the number of CPUs available, the right answer depends so much on the nature of the task and the data that is not even worth talking about at this stage (and besides the primary concerns should be your NUMA nodes, memory locality and interlocked cache contention, and only after that the number of cores).

2) If you're doing I/O (including DB calls) then you should use Asynchronous I/O and complete the calls in ThreadPool called completion routines.

These two are the the only valid reasons why you should have multiple threads, and they're both best handled by using the ThreadPool. Anything else, including starting a thread per 'request' or 'connection' are in fact anti patterns on the Win32 API world (fork is a valid pattern in *nix, but definitely not on Windows).

For a more specialized and way, way more detailed discussion of the topic I can only recommend the Rick Vicik papers on the subject:

Remus Rusanu
A: 

In addition to processor count, you may want to take into account the process's processor affinity by counting bits in the affinity mask returned by the GetProcessAffinityMask function.

wkf
A: 

If there is no excessive i/o processing or system calls when the threads are running, then the number of thread (except the main thread) is in general equal to the number of processors/cores in your system, otherwise you can try to increase the number of threads by testing.

bill