views:

24

answers:

2

Hey guys, I want to allow people to put in simple text search terms, run a pig job(if that's best? it's what I know best) and output the results (the tsv file results?) so I can show them in a web interface.

Is there anything that approaches this problem? Any thing known to link a few disjointed pieces of the flow I am going for, together?

Thanks,

+1  A: 

Why don't you index the docs into Lucene or Solr? Then you can do text search in real-time. Hadoop is designed for batch oriented processes, which doesn't seem like what you want in this case.

bajafresh4life
Never done a solr index. I guess proof of concept is the first pass I am going for and implementing a Solr/Lucene index might be more start-up than I am going for.
ButtersB
Really? It might be easier to use Solr for a POC than it would be to use Hadoop for something it wasn't designed to do.
bajafresh4life
Well, the hadoop is nice because it is great at really crunching the data.
ButtersB
A: 

Well, it depends on your project's requirements. Does it need low-latency, and how complex is the ad hoc search. Well I think hbase+pig might be a comprised solution. hbase can be used for search real-time search purpose (although its search function is not so powerful than RDBMS) and pig for batch_processing of large amount for data.

zjffdu