views:

272

answers:

2
+2  Q: 

MapReduce on AWS

Anybody played around with MapReduce on AWS yet? Any thoughts? How's the implementation?

+3  A: 

It's easy to get started.

Here's an FAQ: http://aws.amazon.com/elasticmapreduce/faqs/

And here's the Getting Started Guide: http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/

If you have an EC2 account already, you can enable MapReduce and have a sample application up and running in less than 10 minutes using the AWS Management Console.

I did the pre-packaged Word Count sample application, which returns a count of each word contained in about 20 MB of text. You can provision up to 20 instances to run concurrently, though I just used 2 instances and the job completed in about 3 minutes.

The job returns a 300 KB alphabetized list of words and how often each word appears in the sample corpus.

I really like that MapReduce jobs can be written in my choice of Perl, Python, Ruby, PHP, C++, R, or Java. The process was painless and straightforward, and the interface gives good feedback on the status of your instances and the job flow.

Be aware that, since AWS charges for a full hour when an instance is created, and since the MapReduce instances are automatically terminated at the end of the job flow, the cost of multiple fast-running job flows can add up quickly.

For example, if I create a job flow that uses 20 instances and returns results in 15 minutes, and then re-run the job flow 3 more times, I'll be charged for 80 hours of machine time even though I only had 20 instances running for 1 hour.

mb
Our experiences as well, plus the slow start up time. It's only worth it if you have jobs that are likely to run for several hours.
Kevin Peterson
A: 

It is very convenient, because you don't have to administer your own cluster. You just pay-er use, so I think it is a good idea if you have a job that needs to run once in a while. We are running Amazon M/R only one time in a month, so it worth it.

But, as far as I can tell, the drawback of amazon M/R is that you can't tell which OS is running, not even its version. So I had problems running c++ code that compiled with g++ 4.44, some of the images does not support cUrl library, etc.

If you don't need any special library, I would say go for it.

sagie