ansaurus

Question

How to make sure Solr/Lucene won't die with java.lang.OutOfMemoryError?

Answer 1

A:

Looking at the stack trace, it looks like you are performing a search, and sorting by a field. If you need to sort by a field, internally Lucene needs to load up all the values of all the terms in the field into memory. If the field contains a lot of data, then it is very possible that you may run out of memory.

bajafresh4life 2010-03-07 19:31:09

I don't think I'm doing any of that, it was only indexing. How can I debug such things?

taw 2010-03-08 05:01:39

Was a .hprof file created when the OOM exception was thrown? You could then use http://eclipse.org/mat/ to analyze the file and determine how many objects and of which size were in memory at the time of the exception. That should give you an idea of what the problem is.

Flynn81 2010-03-26 16:51:56

Answer 2

A:

a wild guess, the documents you are indexing are very large

Lucene by default only indexes the first 10,000 terms of a document to avoid OutOfMemory errors, you can overcome this limit see setMaxFieldLength

Also, you could call optimize() and close as soon as you are done with processing with Indexwriter()

a definite way is to profile and find the bottleneck =]

Narayan 2010-03-09 11:35:24

Definitely not. They are in fact a few hundreds of bytes each typically, and there's millions of them.

taw 2010-03-10 10:58:48

Answer 3

A:

I'm not certain there is a steadfast way to ensure you won't run into OutOfMemoryExceptions with Lucene. The problem you are facing is problem related to the use of FieldCache. From the Lucene API "Maintains caches of term values.". If your terms exceed the amount of memory allocated to the JVM you'll get the exception.

The documents are being sorted "at org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)", which will take up as much memory as is needed to store the terms being sorted for the index.

You'll need to review projected size of the fields that are sortable and adjust the JVM settings accordingly.

Flynn81 2010-03-26 17:05:11

Fields are fairly small, and there's not many of them per document; on the other hand there's quite large number of documents. Does it mean I'll need to increase JVM size every time I increase number of documents in solr? That's pretty drastic scalability bottleneck.

taw 2010-03-27 10:06:29

Not sure, I'm familiar with Lucene but not with Solr sitting on top of it. My answer was based on experience with an index with 16 million documents with fields that could be over 4,000 characters in length. If you want to sort 1000 of those documents, lucene will use a certain amount of memory. My suggestion is to calculate a rough max number of memory usage and allocate it to the JVM (and keep growth rates in mind). Does anyone have any other ideas?

Flynn81 2010-06-07 12:58:37

Answer 4

A:

You are using the post.jar to index data? This jar has a bug in solr1.2/1.3 I think (but I don't know the details). Our company has fixed this internally and it should be also fixed in the latest trunk solr1.4/1.5.

Karussell 2010-03-28 18:02:17

No, I submit XML over TCP/IP from Rails/acts_as_solr.

taw 2010-03-28 21:29:27

ansaurus

tags:

views:

answers:

How to make sure Solr/Lucene won't die with java.lang.OutOfMemoryError?

related questions