views:

2604

answers:

5

What are the various ways of optimizing Lucene performance?

Shall I use caching API to store my lucene search query so that I save on the overhead of building the query again?

+7  A: 

Have you looked at

Lucene Search Engine: Optimization Techniques

Lucene Optimization Tip: Reuse Searcher

Lucene Optimization Blog

Advanced Text Indexing with Lucene

http://stackoverflow.com/questions/119994/should-an-index-be-optimised-after-incremental-indexes-in-lucene

Mitch Wheat
the luceneoptimization.blogspot.com link at the top of this post is dead
Steen
@steen: Thx but I don't maintain every link on the internet! ;)
Mitch Wheat
+1  A: 

Cheat. Use RAMDirectory to load the entire index into the ram. Afterwards, everything is blazing fast. :)

Emil H
+1  A: 

I have found that the best answer to a performance question is to profile it. Guidelines are great, but there is so many variables that can impact performance such as the size of your dataset, the types of queries you are doing, datatypes, etc.

Get the Netbeans profiler or something similar and try it out different ways. Use the articles linked to by Mitch, but make sure you actually test what helps and what (often surprisingly) hurts.

There is also a good chance that any performance differences you can get from Lucene will be minor compared to performance improvements in your code. The profiler will point that out as well.

Nathan Voxland
+4  A: 

Quick tips:

  • Keep the size of the index small. Eliminate norms, Term vectors when not needed. Set Store flag for a field only if it a must.
  • Obvious, but oft-repeated mistake. Create only one instance of Searcher and reuse.
  • Keep in the index on fast disks. RAM, if you are paranoid.
Shashikant Kore
A: 

if we have 1TB of data (PDF files) whose content should be indexed or available for lucene search, will RAMDirectory works? if yes what kind of hardware is required. if not how to deal with this?

if we Create only one instance of Searcher and reuse that, does this mean we create that and make it available in context? and use the same searcher for all ? if we have indexes in multiple directories then multiple index searches could be kept in context or so??

KP
You shouldn't use an answer to ask more questions. This should be added to the original question in an edit.
workmad3