views:

113

answers:

3
+2  Q: 

Hadoop begineers

Hi,

I'm trying to practice some data mining algorithms over hadoop. Can I do it with HDFS alone or do I need to use the sub-projects like hive/hbase/pig?

Thanks, ram.

+3  A: 

I've found a university site with some exercises and solutions for MapReduce that build only on Hadoop:

http://www.umiacs.umd.edu/~jimmylin/Cloud9/docs/index.html

Additionally there are courses from Yahoo and Google:

http://developer.yahoo.com/hadoop/tutorial/

http://code.google.com/edu/parallel/index.html

All these courses work on plain Hadoop, to answer your question.

Thomas Koch
+1 for yahoo. Id take the simple yahoo tutorials and expand on them. Make the input files MUCH bigger, change the map/reduce functions, go from a single instance to a small cluster and continually expand on what you have done previously.
Ralph Willgoss
+2  A: 

I would also recommend the umd site. However it looks like you are completely new to Hadoop. I woudl recommend the book "Hadoop: THe Definant Guide" by Tom White. Its a bit dated [meant for the 0.18 version, rather than the latest 0.20+). Read it, do the examples and you should be at a better place to judge how to structure your project.

monksy
+1  A: 

Start with plain mapreduce at beginner level. You can try Pig/Hive/Hbase at the next level.

You will not be able appreciate Pig/Hive/Hbase unless you struggle enough to use plain map reduce

Harsha Hulageri
+1. It defiantly is worth the pain.
gnucom