ansaurus

Question

Split Lucene index files without reindexing

Answer 1

A:

One option mentioned in SO is to clone the index into many and then delete the documents that don't belong to that index. I'm looking for a better solution.

What's wrong with this solution? This strikes me as a very clean solution, involving just a few lines of code.

UPDATE:

Regarding the scenario where you have a 100G index, wanting to split 500 times, try this: for every subset of documents that you want to carve out of the index, create hard links to the source index, open the linked index and delete the documents that don't belong to that index. If you're on Linux, hard linking the directory can be done by:

cp -lrp myindex myindex.copy

This can be done as many times as you need to and the links do not consume any disk space.

bajafresh4life 2010-09-04 11:11:12

Nothing is wrong, but think about doing this when your index is 100Gigs and you want to split it into 500 different ones. Just run the scenario through your mind in terms of time and space needed.

Khash 2010-09-06 09:06:13

ansaurus

tags:

views:

answers:

Split Lucene index files without reindexing

related questions