views:

329

answers:

4

I am looking for something that will make it easy to run (correctly coded) embarrassingly parallel JVM code on a cluster (so that I can use Clojure + Incanter).

I have used Parallel Python in the past to do this. We have a new PBS cluster and our admin will soon set up IPython nodes that use PBS as the backend. Both of these systems make it almost a no-brainer to run certain types of code in a cluster.

I made the mistake of using Hadoop in the past (Hadoop is just not suited to the kind of data that I use) - the latency made even small runs execute for 1-2 minutes.

Is JPPF or Gridgain better for what I need? Does anyone here have any experience with either? Is there anything else you can recommend?

A: 

Look at Skandium

Pangea
Looks promising, but GPL is a pain in the ass.
Robert Harvey
A: 

I have heard that Scala has better support for concurrency than Java does.

mlaverd
Question refers to Clojure, not Scala. Both have good concurrency libraries, and both compile to Java bytecode, so JVM-aware tools are the relevant answer.
Stuart Sierra
+2  A: 

Clojure is reported to work on Terracotta, subject to some patching.

Stuart Sierra
A: 

Check out cascalog - http://github.com/nathanmarz/cascalog

simon-says