views:

186

answers:

3

Hello,

Here is the situation : we have to offer a customer with a web-based search engine that will make a search for a given string inside a list of documents whose paths are logged in a database.

The supported documents are PDF, Word, Excel, TXT.

So we have 2 options :

  • PHP
  • ASP

Anyone heard of any good open-source solutions regarding this ?

Thanks !

EDIT : The documents are INTRANET only, so this is not a viable solution.

A: 

You're probably looking for Lucene:

http://wiki.apache.org/lucene-java

It's not in php or asp, but it's one of the best search engines you're gonna get.

That is, assuming you can't just get google to index the content for you.

Paul McMillan
+3  A: 

Have you considered Lucene? Whilst Java-bsaed, there are other implementations for your preferred platform of choice.

Your solution would require 2 parts, an indexer (that would constantly trawl through your DB of documents creating the appropriate indexes) and your search app (which would be web-based and search your index for the appropriate page).

Lucene seems to be the defacto choice atm. Also, there is plenty of information floating around SO (and enough expert, myself excluded, to help you out if you get stuck!)

Good luck!

Sam Pride
A: 

If you are using Microsoft, then a the Microsoft Index Service is a really good solution. I've been using it in one company for their whole Intranet and it worked like a charm. Took me half a day to get it up and running.

If you want Index Service to index PDFs as well you need to install a small tool from Adobe called iFilter.

The nice thing is that the Index Service is available on every Windows Server installation, which leaves you the trouble of installing stuff.

Michal