views:

140

answers:

6

i am trying to simplify problem as follows,

  1. i have around 100+ files that i would like to read and then process the data
  2. For which i maintain array of file names and location
  3. I spawn threads to do the job of reading files.

Now my problem is i would like to make sure that only 5 threads are spawn at a time as starting 100 + threads is not good idea at all.

So please tell me what approach i should use to ensure that the only 5 threads are working at time and as soon as even one of them is done new one can be started.

Thanks all,

+3  A: 

Split your file list into 5 equal size lists. Then start five threads, and pass each a separate smaller list via ParameterizedThreadStart.

However, since the work is almost entirely I/O bound, this process is not likely to benefit from threading.

Sam
@Sam: Not so sure about not having benefits from multithreaded IO. There is file data to be copied around in memory, the OS has better knowledge of the reads and could optimize, the disk could support parallel IO (RAID?), etc. Of course, one cannot say until we actually measure it, so it is premature to say anything about it.
Moron
+2  A: 

You should take a look at the

system.threading.threadpool.setmaxthreads

rerun
Using setmaxthreads is not really recommended, unless you really know what you are doing. You are capping the shared threadpool and the libraries you use might end up being affected by it.
Moron
I gave this a -1, because it is usually bad practice to use this.
Moron
A: 

Do your processing through ThreadPool, then setMaxThreads

http://msdn.microsoft.com/en-us/library/system.threading.threadpool.setmaxthreads.aspx

donjay
Using setmaxthreads is not really recommended, unless you really know what you are doing. You are capping the _shared_ threadpool and the libraries you use might end up being affected by it.
Moron
I gave this a -1, because it is usually bad practice to use this.
Moron
+1  A: 

Though this might not answer your question directly, but it seems that a producer-consumer design would fit your needs. Also, this might help.

KMan
-1: How does this answer help if you want to do multithreaded IO?
Moron
Thanks for your comment Moron. What is the reason that you assume that it would not provide multi threaded IO? Dont you think, you can request a thread-pool for multiple threads to produce, and multiple threads to consume as well.
KMan
Producer-Consumer is about _data_. The assumption is that the threads are already there and are running the consumers/producers. If you just wanted to say use ThreadPool, just say that. Saying producer-consumer is pointless and causes confusion.
Moron
+2  A: 

I usually do this approach:

Declare a shared integer variable to denote number of working threads. When a job is assigned to a thread (simply queue the job into ThreadPool), increase the value. When a thread completes the job, decrease the value.

Make sure of decrement or increment of the integer value as atomic.

In the job dispatcher, fetch a job and assign to a thread only if number of working threads is less than the maximum value. Otherwise, wait for a signal (which will be triggered by a working thread completing a job). If you want event simpler, let the dispatcher simply do empty loop to wait.

The good point is that the maximum value is configurable, and it takes the advantage of the built-in ThreadPool. Writing a consumer/producer model to solve such a small problem is costly.

SiLent SoNG
+4  A: 

I vote for the task parallel library / Rx (included in .NET 4.0, but downloadable for 3.5):

        var options = new ParallelOptions();
        options.MaxDegreeOfParallelism = 5;

        Parallel.ForEach(GetListOFiles(), options, (file) =>
        {
             DoStuffWithFile(file);
        });

Note that this will use up to 5 threads, but I've seen it use less.