views:

79

answers:

1

My aim is to build an aggregrator of news feeds and blog feeds so as to make searching/tracking of entitites in it easy. I have been looking at many solutions out there like Terrier, Lucene, SWISH-E, etc.

Basically, I could find only 2 sources of comparison studies done on these engines and one of them is kinda outdated. Basically I want a search engine which would be used in a case in which the data size is not too large, but the indexing will be frequent, every 30 minutes or so. I feel Terrier is not a good tool to be used in this case. It works better when the data size is large and updation frequency is low. Can somebody who has worked in the Information Retrieval field offer some advice ?

+3  A: 

Lucene is well known and supported, so personally, that would be my first choice.

daveb
He seems to agree with you - http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/
vinutheraj
What if my primary purpose is to do research, and Lucene does not offer much in form of different similarity/scoring algorithms. Is there any academic opensource engine which may tried out, other than Terrier ?
vinutheraj
Lucene does allow quite a bit of manipulation around scoring. Some searches in the mailing lists should bring up some info.
daveb
There is almost nothing you can't do with Lucene.For better performance, try having a look at clucene (Lucene in C++) - http://clucene.sourceforge.net/
synhershko