tags:

views:

506

answers:

3

hi i am interested to do web crawling.i was working on solr.so,solr do web crawling or what are the steps to do web crawling?

+2  A: 

Solr does not in of itself have a web crawling feature.

Nutch is the "de-facto" crawler (and then some) for Solr.

mjv
A: 

Solr does not do web crawling, it's a search server that provides full text search capabilities. It builds on top of Lucene.

If you need to crawl web pages then you have a number of options including:

If you want to make use of the search facilities provided by Lucene or SOLR you'll need to build indexes from the web crawl results.

See this also:

http://stackoverflow.com/questions/1580882/lucene-crawler-it-needs-to-build-lucene-index

Jon
A: 

Def Nutch ! Nutch also has a basic web front end which will let you query your search results. You might not even need to bother with SOLR depending on your requirements. If you do a Nutch/SOLR combination you should be able to take advantage of the recent work done to integrate SOLR and Nutch ... http://issues.apache.org/jira/browse/NUTCH-442

wmitchell