ansaurus

Question

Use of Lucene to store data from RSS feeds

Answer 1

A:

My parsing of your question is:

for each item in feed:
    calculate term frequency of item, then add to feed's frequency list

This is not something that Lucene excels at, so CouchDB or another db might be as good if not a better choice (like larsmans suggests). However, it can be done (in a way that is probably slightly easier than other DBs):

HashMap<string, int> terms = new HashMap<string, int>(indexReader.getUniqueTermCount());
TermEnum tEnum = indexReader.Terms();
while (tEnum.Next())
{
    results.Add(tEnum.Term().Text(), tEnum.DocFreq());
}

All Lucene is saving you is the difficulty of calculating the docfreq, and it will probably be a bit faster than looping through all the rows yourself. But I'd be surprised if the performance difference is noticeable for reasonably small data sets.

Xodarap 2010-10-15 15:50:41

ansaurus

tags:

views:

answers:

Use of Lucene to store data from RSS feeds

related questions