tags:

views:

57

answers:

3

I wonder there is a proper way to solr documents with sync database records. I usually have problems: there is solr documents while there are no database records referent by solr. It seems some db records has been deleted, but no trigger has been to update solr. I want to write a rake task to remove documents in solr that run periodically.

Any suggestions?

Chamnap

A: 

I'm using Java + Java DB + Lucene (where Solr is based on) for my text search and database records. My solution is to backup then recreate (delete + create) the Lucene database to sync with my records on Java DB. This seems to be the easiest approach, only problem is that this is not advisable to run often. This also means that your records are not updated in real-time. I run my batch job nightly so that all changes reflect the next day. Hope this helps.

Also read an article about syncing Solr and db records here under "No synchronization". It states that it's not easy, but possible in some cases. Would be helpful if you specify your programming language so more people can help you.

Manny
It takes quite a long time to generate fully index from my database. I can't make it on nightly, since it would take more than a day.
Chamnap
I see, for Java, found some references in http://www.mail-archive.com/[email protected]/msg24663.html and http://wiki.apache.org/solr/DataImportHandler , for Ruby on Rails, http://coderkitty.sweetperceptions.com/2009/3/27/removing-out-of-sync-error-in-acts_as_solr
Manny
A: 

Yes, there is one.

You have to use the DataImportHandler with the delta import feature.

Basically, you specify a query that updates only the rows that have been modified, instead of rebuilding the whole index. Here's an example.

Otherwise you can add a feature in your application that simply trigger the removal of the documents via HTTP in both your DB and in your index.

volothamp
A: 

In addition to the above, "soft" deletion by setting a deleted or deleted_at column is a great approach. That way you can run a script to periodically clear out deleted records from your Solr index as needed.

You mention using a rake task — is this a Rails app you're working with? Most Solr clients for Rails apps should support deleting records via an after_destroy hook.

Nick Zadrozny
yes, i use activemessaging gem in rails, but i can't handle poller script very well.
Chamnap