views:

439

answers:

6

I am working on a problem where I need to perform a lot of embarrassingly parallelizable tasks. The task is created by reading data from the database but a collection of all tasks would exceed the amount of memory on the machine so tasks have to be created, processed and disposed. I am wondering what would be a good approach to solve this problem? I am thinking the following two approaches:

  1. Implement a synchronized task queue. Implement a producer (task creater) that read data from database and put task in the queue (limit the number of tasks currently in the queue to a constant value to make sure that the amount of memory is not exceeded). Have multiple consumer processes (task processor) that read task from the queue, process task, store the result and dispose the task. What would be a good number of consumer processes in this approach?

  2. Use .NET Parallel extension (PLINQ or parallel for), but I understand that a collection of tasks have to be created (Can we add tasks to the collection while processing in the parallel for?). So we will create a batch of tasks -- say N tasks at a time and do process these batch of tasks and read another N tasks.

What are your thoughts on these two approaches?

+1  A: 

Sounds like a job for Microsoft HPC Server 2008. Given that it's the number of tasks that's overwhelming, you need some kind of parallel process manager. That's what HPC server is all about.

http://www.microsoft.com/hpc/en/us/default.aspx

W. Kevin Hazzard
This seems like a bit of overkill, though I could be wrong.
Adam Robinson
+2  A: 

Use the ThreadPool.

Then you can queue up everything and items will be run as threads become available to the pool without overwhelming the system. The only trick is determining the optimum number of threads to run at a time.

Joel Coehoorn
+4  A: 

Use a ThreadPool with a bounded queue to avoid overwhelming the system.

If each of your worker tasks is CPU bound then configure your system initially so that the number of threads in your system is equal to the number of hardware threads that your box can run.

If your tasks aren't CPU bound then you'll have to experiment with the pool size to get an optimal solution for your particular situation

You may have to experiment with either approach to get to the optimal configuration.

Basically, test, adjust, test, repeat until you're happy.

Glen
Using the ThreadPool for embarrassingly parallel tasks will not yield the best results. The best results would be to use a fixed number of worker threads (equal to the number of cores in the machine) an possibly a producer thread. This is what Task Parallel library does.
Pop Catalin
what do you see as the drawback to using a ThreadPool in this instance?Having the number of threads equal to the number of cores will in most cases yield sub-optimal results if any of the tasks has to wait, cache miss, other i/o operations etc. The only way to get to an optimal number of threads it to experiment and measure your results for your particular task.
Glen
@Pop Catalin: You can set a fixed size for the ThreadPool.
Joel Coehoorn
"Having the number of threads equal to the number of cores will in most cases yield sub-optimal results if any of the tasks has to wait". That's the definition of embarrassingly parallel, no tasks have to wait, and in this case using a number of thread equal to the number of cores is ideal. On the other hand if waiting is involved (locks or IO) then thread pool is a better option.
Pop Catalin
From the description of the problem and the further addition to this thread, it sounds like the following scenario is true.System is basically always at max processing power but have no dependencies of order of execution. Additionally it sounds like their is no shared data or need for locking. In this particular case, a threadpool feed by a queue as Glen suggested, would probably be the most efficient.Regardless to efficiency, there is a certain value to ease. A threadpool is about as simple an approach to multithreading as you can choose. Easy is quite often good.
Serapth
@Serapth: You may find interesting the article I linked in my recent edit to my answer. Readers Digest version: Efficiently using ThreadPool is hard, and Parallel.For is still a bit faster (and a lot simpler).
Daniel Pratt
When you say "experiment with the pool size", does it mean using GetMinThreads() and GetMaxThreds() of the ThreadPool class?
Nathan Wong
@Glen, also I am not sure why do we need to experiment with the pool size when the task is not CPU bound (I assume you mean I/O bound by reading from the database in this case). If the task is I/O bound ,there is not much we can do besides having one task producer and one task processor (consumer), and the time it take to process will be bound by the I/O time, isn't it?
Nathan Wong
@Pop Catalin Your definition of "Embarassingluy parallel" and mine differ then. My definition is "something there no / little effort is required to split a task into a number of parallel tasks" Waiting doesn't enter into the equation at all. Each parallel task can wait if it needs to.
Glen
@Nathan Wong By experimenting with the pool size I mean setting the max and min number of threads. Start with SetMaxThreads(N) and SetMaxThreads(N) where N= number of cores. Test your throughput. Then increase N until you aren't getting any performance gains
Glen
@Nathan Wong If you're IO bound you may get improvements by increasin the number of threads. You also may not. It all depends on your hardware.The cardinal rule of performance tuning is to measure.So record your performance, change something, measure again, change something, measure again. Repeat until you get the performance you need
Glen
+3  A: 

I've not had the opportunity to actually use PLINQ, however I do know that PLINQ (like vanilla LINQ) is based on IEnumerable. As such, I think this might be a case where it would make sense to implement the task producer via C# iterator blocks (i.e. the yield keyword).

Assuming you are not doing any operations where the entire set of tasks must be known in advance (e.g. ordering), I would expect that PLINQ would only consume as many tasks as it could process at once. Also, this article references some strategies for controlling just how PLINQ goes about consuming input (the section titled "Processing Query Output").

EDIT: Comparing PLINQ to a ThreadPool.

According to this MSDN article, efficiently allocating work to a thread pool is not at all trivial, and even when you do it "right", using the TPL generally exhibits better performance.

Daniel Pratt
Does anyone know that if I use PLINQ, does it adaptively adjust the number of threads? Say on an 8-core machine, intially there is no other process running so PLINQ create 8 threads (possibly 7 workers and 1 producers), but later on there are some other processes running on the machine, hence the number of worker/producer threads will be reduced?
Nathan Wong
I've not read anything that suggests PLINQ (or TPL on which PLINQ is based) does this. On the other hand, given TPL's dynamic work distribution strategies, reducing the number of threads would probably be an anti-optimization in most cases.
Daniel Pratt
A: 

In order to give a good answer we need a few questions answered.

Is each individual task parallelizable? Or each task is the product of a parallelizable main task?

Also, is it the number of tasks that would cause the system to run out of memory, or is it the quantity of data each task holds and processes that would cause the system to run out of memory?

Turbo
Basically, each task is a scientific computation of different set of data. The individual can possibly be furthered parallelized at the algorithm level.The number of tasks will cause the system to run out of memory, not that quantity in each task.
Nathan Wong
Number of tasks...thats a lot of tasks :-) So it sounds like you have something that creates tasks to get executed, and tasks can create sub tasks (parallelization of an algorithm) to also get executed. What I might do is utilize a bounded Queue, where only N tasks can exist in the Q, and the Q blocks on adding to the Q when task count == N. Then you can have X number of task processors pulling from the Q. as task processors pull from the Q, the Q opens up and allows a blocked task add a new task to the Q. and so on, and so forth.
Turbo
A: 

Sounds like Windows Workflow Foundation (WF) might be a good thing to use to do this. It might also give you some extra benefits such as pause/resume on your tasks.

Erich Mirabal
Since all WWF tasks run on the same thread, it is largely useless for fine-grained parallelism.
Gabe