views:

181

answers:

7

I am looking for a very robust software search engine to integrate in a .Net web site.

The current proposed solution is Lucene.NET a stack based on Lucene. However, I would like to evaluate other search engines before making my mind up.

The feature set we need is the following:

  • Ability to crawl arbitrary pages via HTTP
  • Ability to parse sitemaps
  • Ability to get lists of URIs to parse via a database look-up
  • Ability to restrict the search to a particular language/locale
  • Ability to restrict the search to a subset of the pages (e.g. via a regex on the URI)
  • Speed and scalability (this is for a public website with a ton of traffic)
  • Must have .NET API support or a super-easy http-based API that can be wrapped in a .NET API
  • Language-dependent full-text support

Other things which would be great, but not deal-breakers if they aren't supported:

  • Reporting
  • Aliasing and biasing of results
  • HTTP-based administration pages
  • SQL Server support

What other software search engines have worked for you? Is there any you would recommend or that we should avoid?

+1  A: 

I'd recommend checking out Solr. It's Java-based, but meets the HTTP-based API leg of your requirements, is designed to run on a separate box/cluster from your primary app (so you don't necessarily need Java AND .NET on the same hardware), and it has a lot of momentum. It's been a while since I worked with it, but I don't remember it providing it's own crawler. If that's still the case, it should be straightforward to use a standalone crawler and the aforementioned API to make it work.

Hank Gay
+1  A: 

Instead of using Lucene.Net directly, have you considered using something that wraps it and provides more functionality akin to what you're after?

Solr is an Apache product that does this, and there is also a .Net client port of for it. I've never used it in production, but it looks like the type of thing you're after.

Along a similar idea is Nutch (written by the guy who originally wrote Lucene), although I'm not aware of any .Net version of it. Nutch does have a spider component to crawl sites.

adrianbanks
Solrnet is not a .net port of Solr, it's a client library.
Mauricio Scheffer
@Mauricio: updated answer
adrianbanks
A: 

Lucene is the only one I know of, but it would require you to write a fair bit of what you wanted yourself.

Burt
+1  A: 

Like others have said, definitley go with the original Lucene using Solr. Integrating it with .Net is super simple. I actually just recently blogged about this: http://crazorsharp.blogspot.com/2010/01/full-text-search-using-solr-lucene-and.html

BFree
+3  A: 

Lucene.Net is an information retrieval library, not a search engine. In particular it won't do any of:

  • Crawl web pages or parse sitemaps
  • Reporting
  • HTTP-based administration pages
  • SQL Server support (Lucene.Net uses its own simple but highly effective file format, and doesn't use SQL Server)

Although I'm a strong supporter or SQL and would recommend it as the full-text search component of a search engine, you will also need a crawler / html parser component in order to create a full functional search engine, and you are going to have to carefully design your Lucene.Net indexes to maximise the performance of the queries that you want (searching by language/locale)

Try looking at the Solr project which is a fully fledged search engine using Lucene - this might be more suited towards your needs.

Kragen
+2  A: 

Check out Microsoft's Search Server Express, although the page looks screwed up at the moment so try this link.

There's other enterprise engines out there such as vivisimo velocity (very extensible), autonomy, etc. Lucene and Solr are limited, hard to use and configure, but that's what you get when you want something free.

sw
+1  A: 

Coveo is the search engine that we are currently putting in to replace a Google Mini that was used for a number of years. I'm just pointing these out as something to explore as I haven't used either enough to know how good they are. I just know of headaches with each, many many headaches.

JB King