Cassandra or Hadoop Hive or MYSQL?

views:

152

answers:

+1 Q:

Cassandra or Hadoop Hive or MYSQL?

Hey. I am Developing a Web Crawler,Which is Good for storing data? Cassandra or Hadoop Hive or MySQL?and why?i am having 1TB of Data from past 6 Months in my MySQL DB,i need to index them and i need to get the out put in my search ASAP,and as i think,it will store more amount of DATA,like 10 Peta Byes as my crawler are working fast,i need to get the read/write operation fast,i need to integrate it in my PHP app

+2 A:

That depends on details of your requirements, but I think that in your case HBase would be the best option.
Using HBase as a web-crawler database is well documented and it's HBase's use that is described in BigTable whitepaper.

Wojtek 2010-08-17 22:32:45

Hi,

You can use cassandra with elasticsearch.

sirmak 2010-08-18 20:05:28

You're looking for something that's meant for finding documents based on their content -- it should be based on an inverted index. I think that the most natural fit would be Lucene.

See also this article about a Hadoop-Lucene stack for querying terabytes of documents.

Ken Bloom 2010-08-20 03:48:07

ansaurus

tags:

views:

answers:

Cassandra or Hadoop Hive or MYSQL?

related questions