views:

116

answers:

2

I am in the process of building a corporate web site. We are looking for any open source or paid search engine based on the ASP.NET. It should be able to

  1. Search web content of all the pages in the site.
  2. All office documents. etc.
  3. If we have some searach reseults filtering based on the user type and styff.

Please let me know what kind of tools or software we need to consider.

+5  A: 

Try Lucene.NET

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Java Lucene search engine to the C# and .NET platform utilizing Microsoft .NET Framework.

Here are some links of tutorials to get you started:

Andreas Grech
+1  A: 

Solr is another great option, it's effectively a facade atop of lucene that provides you with a nice REST/url based API. There's an available, mature .Net library to work with it too.

http://lucene.apache.org/solr/

http://code.google.com/p/solrnet/

From your question though, are you looking for the actual underlying engine or are you looking for something to also crawl/traverse your content building up the indexes of your chosen search engine?

--

Editing to reply to comment from original poster.

You have two halves of an equation to solve then.

First is picking a search engine that responds to input (keywords) and then queries its indexes and giving back what it believes to be pertinent matches. The second half of the equation is finding a mechanism to populate the search index of your chosen engine.

As far as the engine is concerned, Lucene's been suggested, and I suggested a variant of Lucene that provides an (arguably) improved developer interface. In terms of building your search corpus, that's a bit different. Here you could either choose to write your own software that takes a piece of content and adds it to the index. Advantage here is you have fine grain control of what goes into the search engine and when. The down side is you're writing new code--fortunately modern search engines like Lucene/Solr makes it pretty easy.

Your second option is to use something to automatically crawl your content and adds it to the index. Problems here are in identifying and learning to configure an appropriate option. Depending on your choice of crawlers, it may or may not do a good job of indexing documents sitting on a file system (like say in a corporate Sharepoint site).

Nutch is a crawler from the Apache (makers of Lucene and Solr) that could potentially be used if you opt for not writing your own code. http://wiki.apache.org/nutch/

bakasan
We are building a corporate site. Where any visitor to the site should be able to search all the web content.