views:

83

answers:

2

I have a 64 bit server, 8 GB RAM, dual quad CPU. No resources are ever hitting 100% (except, I guess, the JVM -- right?).

I need to index several million records for Solr, but the machine is in production. I recognize having a second machine for indexing would be helpful.

Should I dedicate a second instance of the JVM, dedicated to Solr?

Right now, when I run an index, pages which are normally served in 200 milliseconds will serve up in about 1.5 seconds, sometimes more... hitting, even, the dreaded "Service is Unavailable" error.

I adjusted my JVM Heap as follows:

-Xmx1024m
-XX:MaxPermSize256m

In case I'm chasing the wrong solution, allow me to broaden the landscape a bit. It seems that I can't affect the indexing speed of Solr. I had previously been indexing about 150,000 records per hour on a dev server virtualized on a workstation. In a production environment with much more hardware available, I'm indexing at the exact same speed.

Without data to prove it, I think that my JVM adjustments did not speed up the indexing, although it may have allowed the CF server to continue serving pages. I must say, the indexing speed is terribly slow, but I do know that it's not a function of the data access layer. I rewrote it from pure ORM to objects backed by SQL Stored Procedures thinking that was the slowdown (no effect).

+1  A: 

Have you tried these optimization tips?

http://bloggeraroundthecorner.blogspot.com/2009/08/tuning-coldfusion-solr-part-1.html

http://bloggeraroundthecorner.blogspot.com/2009/08/tuning-coldfusion-solr-part-2.html

http://bytestopshere.com/post.cfm/lessons-learned-moving-from-verity-to-solr-part-1

Henry
I see by my visited link colors that I've been on the first one, but not the second two. I'll check those out. Thanks! My hope is that I can invest a couple hours to finding a good trick that will return many hours of decreased indexing time. Thanks again.
Chris Adragna
+2  A: 

use a separate instance for indexing the index, the only trick is getting the running searching instance to re-read the updated index, in which case, you set up a master (the indexer) and slave(the searcher) and do replication. this will both make the searcher not get interrupted, and the indexer will utilize its own JVM including its own share of the resources.

recursive9
Excellent. I figured I couldn't just have a JVM for indexing AND have search work. You gave me the missing link. Thanks!
Chris Adragna