tags:

views:

40

answers:

1

Hi, When nutch finishes its cycle (that is crawl - fetch- parse - index) during index phase, I do not want nutch to index (lucene index), but I want nutch to place all the crawled data (I believe he keeps them as NutchDocument object) into mysql using my code.

Is there any way to do this?

Thanks

A: 

Create your own java class that manage the Nutch cycle. It should be similar to org.apache.nutch.crawl.Crawl but you will have to replace the call to the indexer by a call to your Mysql connector. Or you can call your Mysql connector during each cycle depending on whether you want to update Mysql at the end of the crawl or while it is happening.

Pascal Dimassimo
Thanks. I will look into both options. For now, changing the call from solrIndexer to my custom mysqlIntegrator looks more feasible, though it will be called at end of cycle.