views:

1110

answers:

2

Is there a fast and easy way of getting term frequencies from a Lucene index, without doing it through the TermVectorFrequencies class, since that takes an awful lot of time for large collections?

What I mean is, is there something like TermEnum which has not just the document frequency but term frequency as well?

UPDATE: Using TermDocs is way too slow.

+2  A: 
erickson
A: 

TermDocs gives the TF of a given term in each document that contains the term. You can get the DF by iterating through each <document, frequency> pair and counting the number of pairs, although TermEnums should be faster. IndexReader has a termDocs(Term) method that returns a TermDocs for the given Term and index.

Kai Chan
can this approach be used to determine term frequencies is a result set of a Lucene query?
Aaron Saunders