views:

132

answers:

1

How do you execute a Unix shell command (e.g awk one liner) on a cluster in parallel (step 1) and collect the results back to a central node (step 2)?

Update: I've just found http://blog.last.fm/2009/04/06/mapreduce-bash-script It seems to do exactly what I need.

+2  A: 

If all you're trying to do is fire off a bunch of remote commands, you could just use perl. You can "open" a ssh command and pipe the results back to perl. (You of course need to set up keys to allow password-less access)

open (REMOTE, "ssh user@hostB \"myScript\"|");
while (<REMOTE>)
{
  print $_;
}

You'd want to craft a loop with your machine names, and fire off one for each. After that just do non-blocking reads on the filehandles to pull back the data as it becomes available.

Brian Roach

related questions