views:

181

answers:

6

Scenario

I have a very heavy number crunching process that pools large datasets from 3 different databases and then does a bit of processing on each to eventually produce a result. This process is fine if it is only used by a single asset. However I now have 3500 assets that I need to process, which takes about 1hr30mins in the state of the current process.

Question

What is my best option for speeding this process up in terms of a multi-threaded c# application? Realistically I don't have to share anything between the processing of each asset, so I'm confident that being able to run process multiple assets at a time shouldn't cause too many issues.

Thoughts

I've heard good things about thread pools, but I guess realistically I want something that isn't too huge to implement, is easily understandable and can run off a decent number of threads at a time.

Help would be greatly appreciated.

EDIT: I CAN'T GET MY ACCEPT RATE UP BECAUSE WE HAVE TO USE IE6 AND IT'S LOCKED DOWN - MEANING THE JSCRIPT DOESN'T WORK, MEANING NONE OF THE FANCY STUFF WORKS. EDIT: NO I CAN'T GO HOME AND ACCEPT ANSWERS AFTER WORK, 12 HOURS SITTING IN FRONT OF A COMPUTER EVERY DAY MEANS IT'S THE LAST THING I WANT TO SEE WHEN I LEAVE THE OFFICE.

+2  A: 

If you don't have a multi-core processor, multiple machines, and/or the thread processes are not I/O bound, multithreading will not help. Start by profiling the current processing to see where the time is going.

Thread pools are fine, and you can use a task queue to do simple load-balancing, but if there's no spare CPU cycles in the current application this would be a waste of time.

Steven A. Lowe
A: 

The nicest option would be to use the new Task Parallel Library in .NET 4, if you can do this using VS 2010 RC. This has built-in load balancing and work stealing queues, so it will make this task easy to thread, and very scalable.

However, if you need to do this in .NET 3.5, I would recommend using the ThreadPool, and just using ThreadPool.QueueUserWorkItem to start each task.

If your tasks are all very computationally intensive for their entire lifetime, you may want to prevent having too many running concurrently. Some form of queue, which you pull work from and execute, can be beneficial in this case. Just place all of your work items into a queue, and have threads pull work from the queue (with appropriate locking), and process.

If you have a multi-core system, and CPU cycles are your bottleneck, this should scale very well.

Reed Copsey
+2  A: 

In .net you can use the existing Thread Pool, no need to implement one yourself. Here is the relevant MSDN.

You should take care not to run too many processes at once (3500 are a bit much), but using the supplied queuing mechanism should get you started in the right direction.

Another thing to try is using PLINQ.

Johannes Rudolph
+1  A: 

The .Net built in ThreadPool will solve both of your requirements of running a decent number of threads as well as being simple to work with. I have previously written an article on the subject which you can find here.

Payton Byrd
A: 

This is also a good website.

Ardman
A: 

With using SQL Server 2005 or later, you can create user-defined functions in C# and use them from within T-SQL procedures, which can give a marked speedup for number crunching. SQL Server is multi-threaded and does a good job with it, so consider keeping as much of the processing in the database engine as you can.

ebpower