views:

950

answers:

1

I have approximately 10 million objects indexed using NIOFSDirectory.

When I retrieve documents with MatchAllDocsQuery, the performance is significantly worse than other types of Query's, such as BooleanQuery. I ran some tests, performance is approximately 100 times worse.

Since I am only interested in the top n documents anyway, is there a way to retrieve them from the Searcher object without using MatchAllDocsQuery?

I am also considering using WildcardQuery on a random property of the object, but Lucene in Action claims that there are "performance degradations" associated with WildcardQuery.

Suggestions are greatly appreciated!

+1  A: 

As Yuval pointed in the comment, you have not specified the criteria to get top documents for. If you intend to retrieve random documents, you can simply use IndexReader.document() without going through search at all. If you have some criteria, you can use TermQuery (or the query returned by the QueryParser).

Shashikant Kore
Thanks for the answer.I am constrained to work with a Searcher, because there could be cases that I would need a searcher.For example, if the input query is "foo:1, bar:2", I would need to perform "foo:1" search on one of my partitions (read my comments above).I am currently toying with the searcher.doc(i) method. May be working...
Cambium
I suggest you pre-process your queries. It appears that searching for a value of bar never helps you. Therefore, start with the initial query. Choose your index according to the value of bar (still not using lucene). Then make the rest of the query, if there is some left. For example, if the input query is "foo:1, bar:2", choose the second index and issue the query "foo:1". If it is just "bar:2", get a random document from the second index, possibly as Shashikant Kore suggested.
Yuval F