views:

161

answers:

5

Hello everyone,

I am asked to either deploy or develop an enterprise (intranet) search engine which could index all web pages of a couple of internal servers, and have a search portal to display all related content, like what Google is doing but for intranet.

Any advice how to develop or deploy quickly? I have heard of Microsoft FAST product, not sure whether it is for this purpose?

thanks in advance, George

+2  A: 

The google search appliance is a hardware solution that you might be interested in checking out.

A software based approach could be the Lucene search engine.

lomaxx
Cool, and both of them have built-in relevant and ranking algorithms?
George2
I don't think Lucene is that sophisticated. It's just a very good keyword searcher. (Not knocking it, I've used it on more than one project.)
Rex M
+2  A: 

A free Microsoft solution is Microsoft Search Server Express. Works similar to the search in SharePoint.

pb
Looks like Windows Search Server Express could only support crawl content from SharePoint and run on top of SharePoint?
George2
Index content on file servers, Web sites, Windows SharePoint Services, Microsoft Office SharePoint Server, Exchange Server public folders, and Lotus Notes repositories. And is a standalone install.
pb
Thanks pb! This is just what I want. If I need to customize the ranking part or some other relevance matching part, any APIs?
George2
Don't know. Only used the OOB functionality.
pb
+3  A: 

Depending on the level of polish you need, the Nutch project would be an almost turn-key solution for you. http://lucene.apache.org/nutch/

Kevin Peterson
What do you mean "level of polish you need"?
George2
You'll probably need to write your own front end. I'm guessing, but from related tools (Solr) the interface is probably going to look like something on an engineer could use.
Kevin Peterson
Thanks pb! This is just what I want. If I need to customize the ranking part or some other relevance matching part, any APIs provided by Nutch? Is it easy to extend? My requirement is I need to develop some language and industry specific search, so need some special key words extraction, ranking, etc. Any advice?
George2
A: 

George,

It sounds like you're in a big hurry.

You better start setting expectations on re-work, re-work, re-work.

I highly recommend that you spend time now to

  • establish the requirments, possibly as basic, middle and blue-sky

  • determine what search engines, front-ends, crawlers, etc., (either open-source or vendor-provided), can really met your requirments

  • determine the available support for those tools, and the likelyhood of getting timely and workable answers or work-arounds (Open-source at least doesn't come this a support contract)

  • don't try to do it all at once. Do the smallest data-set first, regardless of how far up in mgmt your sponsor is. You won't have spent months doing tests only to discover a fatal large-scale flaw in the system, or your plan

  • comnunicate to your team and sponsors by creating a roadmap to your various levels or requirments, with check-points

  • As far a pre-planning for even a small-to-medium corporate search project, I highly recommend Martin White's , 'Making Search Work'.

http://www.amazon.com/Making-Search-Work-Implementing-Enterprise/dp/1573873055/ref=sr_1_1?ie=UTF8&qid=1249009370&sr=8-1

I think you'll find that the ranking and relevance are the one of the if-iest parts of getting a good search solution delivered. Engines probably provide similar functionalities, but the details of how to do it will be different, AND more importantly, the success that you have with forcing relevance will only partly be a function of the search engine that you pick. Put another way, if your text is not in harmony with the search-engines algorithm, you'll spend a lot of time trying to understand various tuning parameters, and their combinatorics. (I'm only familiar with 2 so far, so others are welcome to contradict this).

It's a great learning experience. Good luck.

A: 

FAST is a great enterprise search product. It usually ranks top on all the consulting firms evaluations. It does require a moderate amount of technical setup and support though.

Google is another solid product but it is very expensive. It requires a less technical support, but also gives you less control of the search results.

DMurph11