views:

572

answers:

2

I would like to implement a search engine which should crawl a set of web sites, extract specific information from the pages and create full-text index of that specific information.

It seems to me that Xapian could be a good choice for the search engine library.

What are the options for a crawler/parser to integrate with Xapian?

Would Solr be a better choice than Xapian to integrate with open source crawlers/parsers?

A: 

Here's a little comparison between Xapian and Solr.

But if you want to build a crawler, take a look at Nutch. It's extensible with plugins, so you could write a plugin that analyzes the information that you're looking for.

Mauricio Scheffer
+1  A: 

Flax may provide some of what you're looking for.

http://www.flax.co.uk/index.shtml

Rob Young