I'm in the process of updating a tool that uses a Lucene index. As part of this update we are moving from Lucene 2.0.0 to 3.0.2. For the most part this has been entirely straightforward. However, in one instance I cant seem to find a straightforward conversion.
Basically I have a simple query and I need to iterate over all hits. In Lucene 2 this was simple, e.g.:
Hits hits = indexSearcher.search(query);
for(int i=0 ; i<hits.length() ; i++){
// Process hit
}
In Lucene 3 the API for IndexSearcher
has changed significantly and although I can bash together something that works, it is only by getting the top X
documents and making sure that X
is sufficiently large.
While the number of hits (in my case) is typically between zero and ten, there are anomalous situation where they could number much higher. Having a fixed limit therefor feels wrong. Furthermore, setting the limit really high causes OOME which means that space for all X
possible hits is allocated immediately. As this operation is carried out alot, something reasonably efficient is desired.
Edit:
Currently I've got the following to work:
TopDocs hits = indexSearcher.search(query, MAX_HITS);
for (int i=0 ; i<hits.totalHits ; i++) {
// Process hit
}
This works fine except that
a) what if there are more hits then MAX_HITS
?
and
b) if MAX_HITS is large then I'm wasting memory as room for each hit is allocated before the search is performed.
As most of the time there will only be a few hits, I don't mind doing follow up searches to get the subsequent hits, but I cant seem to find a way to do that.