views:

29

answers:

1

We are maintaining a Lucene index which contains around 20mm documents. The nature of the search queries is such that indexing and quering can be easily split between different indexes.

To achive that we need to keep many (potentially thousands) of IndexWriters or IndexReaders/Searchers in memory to deal with indexing and quering of each one of these indiceies (the queries do not span across multiple indexes).

I need to know about the memory pressure this is going to cause, and potential solutions anyone can suggest.

+2  A: 

You might want to take a look at Solr, which supports the creation and management of multiple indices (called cores) out of the box. It will also handle all the work of distribution over multiple nodes if that becomes necessary.

That being said, the memory overhead per index is very low (by design). I think it's something like one byte per document and then the number of unique terms divided by 256.

bajafresh4life