Running MRToolkit hadoop jobs on AWS elastic map/reduce | ansaurus

tags:

views:

31

answers:

1

Q:

Running MRToolkit hadoop jobs on AWS elastic map/reduce

Hi All,

Loving MRToolkit -- great to get away from Java while writing Hadoop jobs. It has become apparent that the library was written to interface with an EC2 cluster, and not with Amazon's elastic map/reduce system. Does anybody have insights into running jobs defined using the toolkit on elastic map/reduce servers? It isn't readily apparent from the web interface, and I'd love to avoid the headache of setting up a cluster by hand on EC2.

I've looked into updloading files under the 'streaming' option (as that's what MRToolkit uses), but Amazon is expecting separate files for the mapper and reducer -- typical MRToolkit style defines them in the a single file as subclasses of predefined Base(Map|Reduce) classes.

Thanks much for any thoughts.

Isaac

+1 A:

It's doable, but not through the web GUI.

Download and install the Ruby Client
Create your cluster: elastic-mapreduce --create --alive [params to size cluster]
Confirm your Elastic Map Reduce Master security group has port 22 open
SSH into your master node
Use git / scp to copy over your application code
Run your app

Ryan Cox 2010-08-05 17:52:58

Ryan,Thanks for the pointers. I've noticed that EMR lets you specify input and output buckets/directories on S3 -- do you know if there's a way to leverage that functionality w/ MRToolkit instead of manually copying it over (with something like s3cmd)?Again, thanks much.Isaac

isparling 2010-08-07 00:34:45

Just use the syntax: s3n://my-input-bucket/prod/logs... Hadoop can cope with the s3 protocol and pull the data directly from s3.

Ryan Cox 2010-08-07 01:16:12

related questions

Best place to get Ruby on Vista up and running as dev environment

How can I encode xml files to xfdl (base64-gzip)?

What is the best way to learn Ruby?

Learning Ruby on Rails any good for Grails?

How to sell Python to a client/boss/person with lots of cash

How do I create a Class using the Singleton Design Pattern in Ruby?

How do I update Ruby Gems from behind a Proxy (ISA-NTLM)

Why Should I Learn Ruby?

How do I create a new Ruby on Rails application using MySQL instead of SQLite?

How do I rake tasks within a ruby script?

Ruby On Rails with Windows Vista - Best Setup?

Mapping values from two array in Ruby

Reverse DNS in Ruby?

Text Editor For Linux (Besides Vi)?

What is good forum software to add to an existing Rails application?

Calling Bash Commands From Ruby

How can I modify .xfdl files? (Update #1)

How do I use (n)curses in Ruby?

Open Source Ruby Projects

How do I fix 'Unprocessed view path found' error with ExceptionNotifier plugin in rails 2.1?

When to use lambda, when to use Proc.new?

Frequent SystemExit in Ruby when making HTTP calls

Implementation of "Remember me" in a Rails application.

.NET Migrations Engine

How do I add existing comments to RDoc in Ruby?