ansaurus

Question

What is the easiest to use distributed map reduce programming system?

Answer 1

+1 A:

Well i'll take a stab at making some suggestions, but your question isn't too clear.

So how are you storing your data? The storage mechanism is separate to how you apply MapReduce algorithms to the data. I'm going to assume you are using the Hadoop Distributed File System.

The problem you illustrate actually looks very similar to the typical Hadoop MapReduce word count example. Instead of words you are just counting users instead.

Some of the options you have for applying MapReduce to data stored on a HDFS are:

Java framework - good if you are comfortable with Java.
Pig - a high-level scripting language.
Hive - a data warehousing solution for Hadoop that provides an SQL like interface.
Hadoop streaming - allows you to write mappers and reducers in pretty much any language.

Which is easiest?

Well that all depends on what you feel comfortable with. If know Java take a look at the standard Java framework. If you are used to scripting languages you could use Pig or streaming. If you know SQL you could take a look at using Hive QL to query the HDFS. I would take a look the documentation for each as a starting point.

Binary Nerd 2010-02-19 01:49:56

Ok, thanks, I'll take a look at these

Zubair 2010-02-19 08:23:55

Hive and Pig look promising!

Zubair 2010-02-19 08:26:07

ansaurus

tags:

views:

answers:

What is the easiest to use distributed map reduce programming system?

related questions