views:

417

answers:

2
+5  Q: 

Lucene performance

Hi guys,

could you please suggest on the steps to be followed for lucene performance. especially with large data (around 1TB of pdf files to be indexed)

+3  A: 
  1. Read Scaling Lucene and Solr.
  2. Define your needs from Lucene (for example: you are indexing PDFs - do you need to store the full text, just to make it searchable, or not at all?)
  3. Make a small-scale experiment - index a few documents, see whether retrieval is good enough.
  4. Try to index the whole thing (considering the paper's tips for quick indexing and for indexing for retrieval speed) - Is retrieval good enough? Is performance good enough?
  5. Iterate.
Yuval F
hi thanks for the reply. how difficult to convert lucene implementation to solr implementation?
KP
I never tried. From what I am hearing, it is very easy to start using Solr. Maybe you should try Solr on a small scale. Maybe I should take my own advice...
Yuval F
+3  A: 

Please check the tips on the question Optimizing Lucene Performance. Since you are working with large amount of data, you also need to watch the index creation performance. Some tips on improving indexing performance and search performance are available on Lucene Wiki.

Shashikant Kore