Hi,
I'm trying to practice some data mining algorithms over hadoop. Can I do it with HDFS alone or do I need to use the sub-projects like hive/hbase/pig?
Thanks, ram.
Hi,
I'm trying to practice some data mining algorithms over hadoop. Can I do it with HDFS alone or do I need to use the sub-projects like hive/hbase/pig?
Thanks, ram.
I've found a university site with some exercises and solutions for MapReduce that build only on Hadoop:
http://www.umiacs.umd.edu/~jimmylin/Cloud9/docs/index.html
Additionally there are courses from Yahoo and Google:
http://developer.yahoo.com/hadoop/tutorial/
http://code.google.com/edu/parallel/index.html
All these courses work on plain Hadoop, to answer your question.
I would also recommend the umd site. However it looks like you are completely new to Hadoop. I woudl recommend the book "Hadoop: THe Definant Guide" by Tom White. Its a bit dated [meant for the 0.18 version, rather than the latest 0.20+). Read it, do the examples and you should be at a better place to judge how to structure your project.
Start with plain mapreduce at beginner level. You can try Pig/Hive/Hbase at the next level.
You will not be able appreciate Pig/Hive/Hbase unless you struggle enough to use plain map reduce