views:

179

answers:

6

Hello everyone,

I am just learning how to write mutithreaded programs right now and I have a hypothetical question about how many threads are optimal for a program.

Let me describe 2 scenarios.

The first scenario is that i have a program that is easily multi threaded but each thread is going to be doing a lot of work (execution times for each thread is on the order of seconds).

The second scenario is that i have a program that is also easily multi thread but each thread has a very short execution time, on the order of milliseconds.

In either one of these scenarios, what is the most efficient way of mutithreading the programs? Is it either creating as many threads as my system memory will allow, or wait for threads to finish before creating new ones so that I only ever have a maxim of 4 worker threads running at any one time.

On one hand, many threads may have overhead problems with cores switching in between threads (from my understanding, its not such a severe overhead). On the other hand, if I limit the number of threads running, that means that I'll be running extra checking conditions and locking and unlocking a counter variable to keep track of the number of threads running, and creating new threads when old ones finish.

I can see that if there are many small threads, it would be best to simply overload my system with as many threads as possable as since there wont be too much thread switching before a thread finishes running. That would save me the overhead of constantly keeping track of the number of threads.

Also, if there are only a few large threads (by few, i mean a couple hundred or so large ones) it would make sense to keep track of the threads so that we keep the threads at optimal numbers such that there isent very much thread switching (as since the overhead would be greater as we would likely switch many times before a single thread finishes).

So would these assumptions be correct for each case or is there a universal way of doing things that would be correct in all situations?

Note: this is assuming a muti core system (for now, lets ignore hyperthreading) and lets ignore any of the typical problems associated with mutithreading (assume that all threads have private write locations and can only read from public ones, locking and unlocking only happens when incrementing or decrementing the counter for number of active threads).

Thanks,

-Faken

+2  A: 

The normal approach to this is to make the thread count configurable, and profile the application performance across several configurations.

Also note that in many cases, it's not the overhead associated with many threads or context switching but bottlenecks caused by synchronizing access to shared resources that cause inefficiency in multithreaded apps. Even if you assume that your code is deadlock-proof, if there's a lot of IO going on, poor synchronization implementations can effectively kill any benefit your parallelization would otherwise have gained you.

Dathan
What about if each thread is reading in global information and then spitting something out straight to hard drive? My bottle neck would be the hard drive write sequence right? In this case, what happens if many threads request a write sequence (each file only being a few KB at most), would the HD jump around writing files or finish one file before moving on?
Faken
Depends on the drive. Most hard disk drivers (originally only SCSI, but now ATA too) support scatter-gather operations, where groups of requests are internally re-sequenced to match the movement of the drive head.
devstuff
Also, use the OS performance counters to determine where your bottleneck actually is before doing any premature optimizations.
devstuff
Additionally, all modern operating systems support write caching. Chances are good that if each thread is only writing a few KB, most of the writes will be caches in memory and you don't have to worry about hard drive delay. You do potentially have an issue if the threads need to write to the same file, because even though writing will be fast you'll only be able to provide serial access to it - can cause a bottleneck if many threads are running or writes are frequent.
Dathan
A: 

I would start with a good enough number and then collect statistics to find out the correct number of threads to run to achieve good performance.

Bhushan
Depends on the PC its running on doesn't it?
Mark
yes of course the environment, the type of work threads will do all matters
Bhushan
+5  A: 

Scenario #1: Make n threads, where 'n' is the number of CPU cores

Scenario #2: The same, but instead of creating and killing the threads all the time, use a Workitem / Thread Pool based approach, like the .NET Parallel Framework does.

Edit: This is a good article that covers #2 - http://msdn.microsoft.com/en-us/magazine/cc163340.aspx ; let PFx figure out how many threads to run, you just describe how tasks are related to each other.

Paul Betts
Good point. If the task is going to run for microseconds it will take nearly as long to set up the thread and get it running as doing the actual work!
James Anderson
Also if you're using C++ and Visual Studio 2010, the Parallel Pattern Library and Concurrency Runtime are available (PFX is .NET). See the concurrency center for pointers to .NET and C++ code: http://msdn.microsoft.com/en-us/concurrency/default.aspx
Rick
+2  A: 

This isn't a question with a hard-and-fast answer, a few points though:

Since your threads are so short lived, maybe you should think about using a pool to manage them? You could create a pool with a number of threads that suits the host system and task profile (say one for each core to start with) and feed it work to do on a queue of some sort. By doing this you eliminate the overhead of starting new threads, allocating a stack for each one etc for each task.

As for appropriate number of threads for the pool, this rather depends on the types of tasks you are running. If they are CPU bound tasks, then one thread per CPU is a good fit: you avoid context switching when you don't need to then. If on the other hand, they are IO bound tasks, say threads doing socket communication, then you might want to double that number foe example, so you can make better use of your processors when you are waiting for IO input.

Anyway, in short, there is no one-size-fits-all approach for this kind of stuff. As ever, profile your code in action to find out where the inefficiencies are and tune it based on your results.

jkp
+1  A: 

Assuming you mean a Windows program, even if it is a C++ rather than a dot-Net program it will repay you to skim Joe Duffy's "Concurrent Programming on Windows" before beginning. He makes a nice pitch for using the thread-pooling-routines provided by Windows, most convincingly because they already internally tune for the processor configuration, taking that burden off your shoulders.
If you then go ahead and roll-your-own anyway, the gotcha's discussed throughout the book will surely save you tripping over the standard pitfalls.

pngaz
A: 

Threads are not cheap. I know of basically two reasons to use them:

  1. To get multiple pieces of hardware working in parallel, whether they be CPU cores, disk heads, some other kind of machine, or servers on the other side of the world.

  2. To get multiple people working in parallel, like users with their own sessions. Here the advantage is not speed, but ease of coding each user's interaction sequence.

Or both, like if you have one thread to crunch, and one to respond to the user.

Mike Dunlavey