views:

253

answers:

4

I recently had a play around with Hadoop and was impressed with it's scheduling, management, and reporting of MapReduce jobs. It appears to make the distribution and execution of new jobs quite seamless, allowing the developer to concentrate on the implementation of their jobs.

I am wondering if anything exists in the Java domain for the distributed execution of jobs that are not easily expressed as MapReduce problems? For example:

  • Jobs that require task co-ordination and synchronization. For example, they may involve sequential execution of tasks yet it is feasible to execute some tasks concurrently:

                   .-- B --.
            .--A --|       |--.
            |      '-- C --'  |
    Start --|                 |-- Done
            |                 |
            '--D -------------'
    
  • CPU intensive tasks that you'd like to distribute but don't provide any outputs to reduce - image conversion/resizing for example.

So is there a Java framework/platform that provides such a distributed computing environment? Or is this sort of thing acceptable/achievable using Hadoop - and if so are there any patterns/guidelines for these sorts of jobs?

A: 

I guess you are looking for a workflow engine for CPU intensive tasks (also know "scientific workflow", e.g. http://www.extreme.indiana.edu/swf-survey). But I'm not sure how distributed do you want it to be. Usually all workflow engines have a "single point of failure".

Alexey Kalmykov
A: 

I believe quite a few problems can be expressed as map-reduce problems.

For problems that you can't modify to fit the structure your can look at setting up your own using Java's ExecutorService. But it will be limited to one JVM and it will be quite low level. It will allow for easy coordination and synchronization however.

Fried Hoeben
+1  A: 

Take a look at Quartz. I think it supports stuff like managing jobs remotely and clustering several machines to run jobs.

Dave Paroulek
+1 excellent framework
JamesC
+1  A: 

I have since found Spring Batch and Spring Batch Integration which appear to address many of my requirements. I will let you know how I get on.

teabot