tags:

views:

125

answers:

3

Hi All

I have solr/lucene index file of say 700GB, now the documents that i need to index are coming in real time say in half an hour 1000 docs are submitted and need to be indexed. now in my scenario an executable run after every 30 mins and index the documents that are not yet indexed, because it is requirement that the new documents should be search-able as soon as possible, but this process slow down the searching

Now what i wana ask, Is this the best way i can index latest documents or there is some other better way!

please explain

Regards Ahsan

+6  A: 

First, remember that Solr is not a real-time search engine (yet). There is still work to be done.

You can use a master/slave setup, where the indexation are done on the master and the search on the slave. With this, indexation does not affect search performance. After the commit is done on the master, force the slave to fetch the latest index from the master. While the new index is being replicated on the slave, it is still processing queries with the previous index.

Also, check you cache warming settings. Remember that this might slow down the searches if those settings are too aggressive. Also check the queries launched on the new searcher event.

Pascal Dimassimo
+3  A: 

You can do this with Lucene easily. Split the indexes in multiple parts (or to be precise, while building indexes, create "smaller" parts.) Create searcher for each of the part and store a reference to them. You can create a MultiSearcher on top of these individual parts.

Now, there will be only one index that will get the new documents. At regular intervals, add documents to this index, commit and re-open this searcher.

After the last index is updated, you can create a new multi-searcher again, using the previously opened searchers.

Thus, at any point, you will be re-opening only one searcher and that will be quite fast.

Shashikant Kore
A: 

^^i do this, with normal lucene, non solr, and it works really nice. however not sure if there is a solr way to do that at the moment. twitter recently went with lucene for searching and has effectively real time searching by just writing to their index at any update. their index resides completely in memory, so updating/reading the index is of no consequence and happens instantly, a lucene index can always be read while being written to, as long as there is only one writer at a time.

recursive9