views:

590

answers:

3

Is it possible to write map/reduce jobs for Amazon Elastic MapReduce (http://aws.amazon.com/elasticmapreduce/) using .NET languages? In particular I would like to use C#.

Preliminary research suggests not. The above URL's marketing text suggests you have a "choice of Java, Ruby, Perl, Python, PHP, R, or C++", without mentioning .NET languages. This Amazon thread (http://developer.amazonwebservices.com/connect/thread.jspa?messageID=136051 -- "Support for C# / F# map/reducers") explicitly says that "currently Amazon Elastic MapReduce does not support Mono platform or languages such as C# or F#."

The above suggests that it can't be done. I'm wondering if there are any workarounds, though. For example, can I modify the Elastic MapReduce machine image for my account, and install Mono on there?

An alternative, suggested by Amazon FAQs "Using Other Software Required by Your Jar" (http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?CHAP_AdvancedTopics.html) and "How to Use Additional Files and Libraries With the Mapper or Reducer" (http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?addl_files.html), is to make the first step of the Map/Reduce job be to install Mono on the local instance. That sounds kind of inefficient, but maybe it could work?

Maybe a saner alternative would be to try to forgo the convenience of Elastic MapReduce, and manually set up my own Hadoop cluster on EC2. Then I assume I could install Mono without difficulty.

+1  A: 

You should be able to use the VB.NET library from any .NET language, including C#.

Reed Copsey
There's also a C# version too (http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2307). However, neither of these look to me like they let you write .NET mappers/reducers; instead, their (shared) purpose seems to providing an API alternative to manually using Amazon's web-based Elastic MapReduce control panel, whose main purpose is to start/stop/configure your MapReduce jobs. I don't think that will help you actually implement a mapper or reducer in VB/C#, though perhaps I'm missing something.
Chris
+1  A: 

There would probably be a possible work-around using Hadoop streaming and compiling your C# code with an Ahead Of Time compiler into native code (check: http://www.mono-project.com/AOT). The binary could be run from S3 like a C++ program could, I guess.

The answer by Reed Copsey is not correct. The VB.NET library is for creating jobs, starting & stopping them, but is not about the code actually running in the Hadoop jobs.

Teun D
A: 

Elastic MapReduce now has a "bootstrap actions" feature, which Amazon currently explains as follows:

A bootstrap action is a mechanism that allows you to run a script on Elastic MapReduce instances prior to Hadoop starting. Bootstrap action scripts are stored in Amazon S3 and passed to Amazon Elastic MapReduce when creating a new job flow. Bootstrap action scripts are downloaded from Amazon S3 and executed on each instance before the job flow is executed.

Bootstrap action scripts can be written in any language already installed on the job flow instance, including Ruby, Python, Perl, and bash.

(See http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?introduction.html)

One suggested use of this is to install software on your cluster machines. You could potentially use this to install a .NET runtime environment (probably Mono rather than Microsoft's, because because I imagine all the Elastic MapReduce machines are running Linux) on your cluster machines. (Not sure how hard the unattended install would be. Any ideas?) Having done so, you can call out to your .NET mappers/reducers using Hadoop streaming, which Elastic MapReduce does seem to support.

Chris