tags:

views:

49

answers:

2

what is the best way to split an existing Lucene index into two halves i.e. each split should contain half of the total number of documents in the original index

+1  A: 

A fairly robust mechanism is to use a checksum of the document, modulo the number of indexes, to decide which index it will go into.

Marcelo Cantos
+3  A: 

The easiest way to split an existing index (without reindexing all the documents) is to:

  1. Make another copy of the existing index (i.e. cp -r myindex mycopy)
  2. Open the first index, and delete half the documents (range 0 to maxDoc / 2)
  3. Open the second index, and delete the other half (range maxDoc / 2 to maxDoc)
  4. Optimize both indices

This is probably not the most efficient way, but it requires very little coding to do.

bajafresh4life