Does solr do web crawling?

tags:

solr

views:

506

answers:

Does solr do web crawling?

hi i am interested to do web crawling.i was working on solr.so,solr do web crawling or what are the steps to do web crawling?

+2 A:

Solr does not in of itself have a web crawling feature.

Nutch is the "de-facto" crawler (and then some) for Solr.

mjv 2009-11-23 05:30:13

Solr does not do web crawling, it's a search server that provides full text search capabilities. It builds on top of Lucene.

If you need to crawl web pages then you have a number of options including:

Nutch - http://lucene.apache.org/nutch/
Websphinx - http://www.cs.cmu.edu/~rcm/websphinx/
JSpider - http://j-spider.sourceforge.net/
Heritrix - http://crawler.archive.org/

If you want to make use of the search facilities provided by Lucene or SOLR you'll need to build indexes from the web crawl results.

See this also:

http://stackoverflow.com/questions/1580882/lucene-crawler-it-needs-to-build-lucene-index

Jon 2009-11-23 05:35:59

Def Nutch ! Nutch also has a basic web front end which will let you query your search results. You might not even need to bother with SOLR depending on your requirements. If you do a Nutch/SOLR combination you should be able to take advantage of the recent work done to integrate SOLR and Nutch ... http://issues.apache.org/jira/browse/NUTCH-442

wmitchell 2009-11-23 05:45:59

ansaurus

tags:

views:

answers:

Does solr do web crawling?

related questions