ansaurus

Question

Collect all hits for a search in Lucene / Optimization

Answer 1

+1 A:

You, probably, are storing lot of information in the document. Reduce the stored fields to as much as you can.

Secondly, while retrieving fields, select only those fields which you need. You can use following method of IndexReader to specify only few of the stored fields.

public abstract Document document(int n, FieldSelector fieldSelector)

This way you don't load up fields which are not used.

You can utilize following code sample.

FieldSelector idFieldSelector = 
new SetBasedFieldSelector(Collections.singleton("idFieldName"), Collections.emptySet());
for (int i: resultDocIDs) {
String id = reader.document(i, idFieldSelector).get("idFieldName");
}

Shashikant Kore 2010-01-20 12:10:17

Thanks for the suggestion and the code sample. I hadn't known about FieldSelector, it could be of use in the future.But, I store only one field in the document and that's what I have to fetch in the end. The only field I store is just the sentence itself plus a few grammatical annotations. That means for a single document (i.e. sentence) I don't store more than 300-400 bytes.(Additional info: I have indexed around 50M documents)

Amaç Herdağdelen 2010-01-20 12:37:14

Answer 2

+1 A:

Scaling Lucene and Solr discusses many ways to improve Lucene performance. As you are working on Lucene search within Wikipedia, you may be interested in Rainman's Lucene Search of Wikipedia. He mostly discusses algorithms and less performance, but this may still be relevant.

Yuval F 2010-01-20 13:10:57

ansaurus

tags:

views:

answers:

Collect all hits for a search in Lucene / Optimization

related questions