amazon-emr

Amazon Elastic Map Reduce - Keep Server alive?

I am testing jobs in EMR and each and every test takes a lot of time to start up. Is there a way to keep the server/master node alive in Amazon EMR? I know this can be done with the API. But, I wanted to know if this can be done in the aws console? ...

Hadoop streaming maximum line length

I'm working on a Hadoop streaming workflow for Amazon Elastic Map Reduce and it involves serializing some binary objects and streaming those into Hadoop. Does Hadoop have a maximum line length for streaming input? I started to just test with larger and larger lines but figured I would ask here first. ...

How can I use Hive on top of Amazon Elastic Mapreduce to process data in Amazon Simple DB?

I have a lot of data in an Amazon Simple DB Domain. I want to start Hive on Elastic Map Reduce (on top of hadoop) and somehow, either import data from simpledb or, connect to simpledb and run hiveql queries on it. I have having issues importing the data. Any pointers? ...

Hadoop streaming and AMAZON EMR

I have been attempting to use Hadoop streaming in AMAZON EMR to do a simple word count for a bunch of text files. In order to get a handle on hadoop streaming and on amazon's EMR I took a very simplified data set too. Each text file had only one line of text in it (the line could contain arbitrarily large number of words). The mapper is...