views:

60

answers:

2

Hi,

Currently i am calling the optimize method of the indexwriter after the completions of the write. Since my data set is huge, it took long time ( and needs more space (2*actual size)) to optimize the index. I am very much concerned about this because lot of documents included frequently in the index.

So

  1. is it ok to turn off optimize?
  2. What are the performance implications, like how much slower the querying when its not optmized?

Cheers Ramesh Vel

+4  A: 

The Lucene FAQ says:

What is index optimization and when should I use it?

The IndexWriter class supports an optimize() method that compacts the index database and speeds up queries. You may want to use this method after performing a complete indexing of your document set or after incremental updates of the index. If your incremental update adds documents frequently, you want to perform the optimization only once in a while to avoid the extra overhead of the optimization.

If I decide not to optimize the index, when will the deleted documents actually get deleted?

Documents that are deleted are marked as deleted. However, the space they consume in the index does not get reclaimed until the index is optimized. That space will also eventually be reclaimed as more documents are added to the index, even if the index does not get optimized.

cuh
@cuh, thats really helpful....... :)
Ramesh Vel
+1  A: 

You know your data best so I would suggest you perform some tests to measure how fast your queries run with and without the optimize step.

According to the javadocs, "in environments with frequent updates, optimize is best done during low volume times, if at all". You should only optimize when necessary. If only 5% of your documents have changed since the last optimize, then it is not necessary, so get a feel of how frequently your documents change. Maybe you can optimise less often, say once every few hours or once a day.

Also take a look at this thread in which they advise against calling optimize at all in an environment whose indices are constantly updated and instead choose to set a low mergeFactor.

dogbane
@fahd, thanks for the link.. :)
Ramesh Vel