I would like to implement a search engine which should crawl a set of web sites, extract specific information from the pages and create full-text index of that specific information.
It seems to me that Xapian could be a good choice for the search engine library.
What are the options for a crawler/parser to integrate with Xapian?
Would Solr be a better choice than Xapian to integrate with open source crawlers/parsers?