tags:

views:

42

answers:

2

I created a solr 1.4 index and would like to serve queries against it for a high-volume application. The index that I am querying is static -- no more updates are allowed. A couple of client apps making requests on the server drive to CPU load to about 200% on a quad-core ubuntu box, so I was thinking of replicating the index on a second box and running it in parallel to allow more throughput.

I shut down solr, copied the index to a separate directory, configured solr server to point to the new index, and fired both of them up. While the original server worked as before, the copy failed to find any documents.

When I do a directory listing on the servers, I see something slightly odd: This is the listing of the original index directory

total 3581328
-rw-r--r-- 1 gene pal 2502676419 2010-07-08 20:53 _38.fdt
-rw-r--r-- 1 gene pal     488660 2010-07-08 20:53 _38.fdx
-rw-r--r-- 1 gene pal        198 2010-07-08 20:53 _38.fnm
-rw-r--r-- 1 gene pal  213752776 2010-07-08 20:54 _38.frq
-rw-r--r-- 1 gene pal     366496 2010-07-08 20:54 _38.nrm
-rw-r--r-- 1 gene pal  725677119 2010-07-08 20:54 _38.prx
-rw-r--r-- 1 gene pal    1124453 2010-07-08 20:54 _38.tii
-rw-r--r-- 1 gene pal   85260530 2010-07-08 20:54 _38.tis
-rw-r--r-- 1 gene pal     280471 2010-07-08 20:54 _38.tvd
-rw-r--r-- 1 gene pal  133020745 2010-07-08 20:54 _38.tvf
-rw-r--r-- 1 gene pal     977316 2010-07-08 20:54 _38.tvx
-rw-r--r-- 1 gene pal        299 2010-07-08 20:54 segments_1b
-rw-r--r-- 1 gene pal         20 2010-07-08 20:54 segments.gen

and this is the listing of the copy:

total 3577796
-rw-r--r-- 1 gene pal 2502676419 2010-07-10 23:16 _38.fdt
-rw-r--r-- 1 gene pal     488660 2010-07-10 23:15 _38.fdx
-rw-r--r-- 1 gene pal        198 2010-07-10 23:15 _38.fnm
-rw-r--r-- 1 gene pal  213752776 2010-07-10 23:15 _38.frq
-rw-r--r-- 1 gene pal     366496 2010-07-10 23:15 _38.nrm
-rw-r--r-- 1 gene pal  725677119 2010-07-10 23:16 _38.prx
-rw-r--r-- 1 gene pal    1124453 2010-07-10 23:16 _38.tii
-rw-r--r-- 1 gene pal   85260530 2010-07-10 23:15 _38.tis
-rw-r--r-- 1 gene pal     280471 2010-07-10 23:16 _38.tvd
-rw-r--r-- 1 gene pal  133020745 2010-07-10 23:16 _38.tvf
-rw-r--r-- 1 gene pal     977316 2010-07-10 23:16 _38.tvx
-rw-r--r-- 1 gene pal        299 2010-07-10 23:15 segments_1b
-rw-r--r-- 1 gene pal         20 2010-07-10 23:15 segments.gen

While the file sizes all look the same, the total shown at the top of each listing is different. And even though the solr startup messages suggest that it's looking at this directory

INFO: Opening new SolrCore at solr/, dataDir=./data/

admin stats show no documents available to solr

What else should I look at to trouble-shoot this problem?

Thanks,

Gene

A: 

(I posted the original question before I had created a stackoverflow account and can't figure out how to edit the original; hence a second post rather than an edit or a comment; sorry for the confusion.)

Yes, I copied the entire solr directory, and then edited the solrconfig.xml to point to the second index location. On startup, both solr instances report the correct dataDir locations.

Gene Golovchinsky
No errors at all in the log? There are no hidden files right? (i.e. try ls -a)
Mauricio Scheffer
Turns out it was my stupidity (which was the operating assumption all along): I specified the index location in the solrconfig.xml file as ./data rather than as ./solr/data because I copied it from the comment above.
Gene Golovchinsky
A: 

I have found that it's much easier to use Solr's built-in Replication API rather than copying index files around on the filesystem.

Create a new, empty, core on your slave machine. Then instruct that core to replicate from the existing master using the fetchindex command. Solr will handle all the rest of the heavy lifting for you.

For example, you will end up with a URL something like this:

http://slave_host:port/solr/corename/replication?command=fetchindex&masterUrl=http://master_host:port/solr/corename/replication

Nick Zadrozny