tags:

views:

277

answers:

1

Currently collecting information where I should use Nutch/Solr/Nutch with Solr (domain - vertical web search). Could you suggest me?

+3  A: 

Nutch is a framework to build web crawler and search engines. Nutch can do the whole process from collecting the web pages to building the inverted index. It can also push those indexes to Solr.

Solr is mainly a search engine with support for faceted searches and many other neat features. But Solr doesn't fetch the data, you have to feed it.

So maybe the first thing you have to ask in order to choose between the two is whether or not you have the data to be indexed already available (in XML, in a CMS or a database.). In that case, you should probably just use Solr and feed it that data. On the other hand, if you have to fetch the data from the web, you are probably better of with Nutch.

Pascal Dimassimo
I have to fetch the data from the web but in more sophisticated way than Nutch's crawler does. And as I know it is very difficult to modify Nutch's crawler (for example to ignore robots.txt, detect js-redirects and so on). My choice is Solr? What Solr can that Nutch can't?
Jeriho
Like I said, Solr is a search engine. There is nothing in it to crawl the web. But if you have a proprietary crawler that works well for you, it should be easy to push the data to Solr.
Pascal Dimassimo