views:

113

answers:

4

I've got a program I'm creating(in C#) and I see two approaches..

1) A job manager that waits for any number of X threads to finish, when finished it gets the next chunk of work and creates a new thread and gives it that chunk

or

2) We create X threads to start, give them each a chunk of work, and when a thread finishes a chunk its asks the job manager for more work. If there isn't any more work it sleeps and then asks again, with the sleep becoming progressively longer.

This program will be a run and done, tho I could see it turning into a service that continually looks for more jobs.

Each chunk will consists of a number of data ids, a call to the database to get some info or perform an operation on the data id, and then writing to the database info on the data id.

+2  A: 

Assuming you are aware of the additional precautions that need to be taken when dealing with multithreaded database operations, it sounds like you're describing two different scenarios. In the first, you have several threads running, and once ALL of them finish it will look for new work. In the second, you have several threads running and their operations are completely parallel. Your environment is going to be what determines the proper approach to take; if there is something tying all of the work in the several threads where additional work cannot continue until all of them are finished, then with the former. If they don't have much affect on each other, go with the latter.

Adam Robinson
A: 

Instead of rolling your own solution, you should look at the ThreadPool class in the .NET framework. You could use the QueueUserWorkItem method. It should do exactly what you want to accomplish.

EFrank
@EFrank: If the work is long-running, then that would not be a good idea.
casperOne
@EFrank (+1 to casperOne): The ThreadPool class uses system threads and is designed for operations with a short life-cycle (and fairly infrequent use). If you're more demanding, you should be creating your own threads.
Adam Robinson
+1  A: 

The second option isn't really right, as making the sleep time progressively longer means that you will unnecessarily keep those threads blocked.

Rather, you should have a pooled set of threads like the second option, but they use WaitHandles to wait for work and use a producer/consumer pattern. Basically, when the producer indicates that there is work, it sends a signal to a consumer (there will be a manager which will determine which thread will get the work, and then signal that thread) which will wake up and start working.

You might want to look into the Parallel Task Library. It's in beta now, but if you can use it and are comfortable with it, I would recommend it, as it will manage a great deal of this for you (and much better, taking into account the number of cores on a machine, the optimal number of threads, etc, etc).

casperOne
The problem with PTL is that it maxs the CPU, and we are more DB bound so we need to manage the number of thr more closely.The P/C pattern is option 1, except instead of creating a new thr after a thread is finished, the thr goes to sleep and the p looks for a waiting thr and sends it more work?
+1  A: 

The former solution (spawn a thread for each new piece of work), is easier to code, and not too bad, if the units of work are large enough.

The second solution (thread-pool, with a queue of work), is more complicated to code, but supports smaller units of work.

Douglas Leeder