views:

211

answers:

8

The website is almost entirely d/x/html, and is hosted on a linux/apache server.

While I'm not opposed to using a database, I've been told that I can implement a solution that parses through the html documents and returns my search results without mucking about too much with asp/php/cgi (which I am most certainly a novice in).

Is this possible? Is there a better way? Should I look to a specific third party application?

THANKS!!!

+1  A: 

There are "spiders" that will crawl your site and generate some form of search index. How reliable these are and how well they perform I really can't say. We recently purchased two Google search appliances here at work and use one for our intranet and one for our external web. They do a very nice job of indexing exactly the content you want as well as setting up specialized "search zones" and even keyword mapping.

I highly recommend them: http://www.google.com/enterprise/mini/

  • Nicholas
Nicholas Kreidberg
+3  A: 

Instead of paying for search appliances, you can also pay Google to have it crawl your site and present customized search results. It's inexpensive and Google does a good job indexing everything (including PDFs). If I remember correctly its ad-supported version is free (i.e. you pay to remove the ads)

pantulis
A: 

Add a link to Google that only returns results for your domain (with a site: delimiter). I don't know how to do this but it shouldn't be hard

BCS
A: 

Thanks all! I'm currently looking into a google custom search engine. The search bars with logos are cumbersome, but if all google wants for the legwork on this is a watermarked search bar and a couple ads served, then that's the solution for me!

+1  A: 

The google search is the easiest route. The only thing I would suggest is that you add a google sitemap to your site. That way you can notify google of updates or new pages to make sure the search listing is as up-to-date as possible.

cOle2
+1  A: 

If you can write some code in your favorite programing language you can also have a look at Apache Solr (url). The concept is simple: You get a seperate Search-Server, already implemented and as a seperated program. You can put in Documents by Posting (HTTP-Post) them to the Search-Server. You can make searches by issuing a GET-Request and getting back a XML-File with the search results.

What you have to write is the code to send the files to the search-search (only some lines of code) and the parsing of the xml-search-results (can be done easily with xslt)

I dont know how many documents you are talking about but this solution scales very well, I currently use it with 2.5 Mio Pages in the Index and get results in under 50 ms.

theomega
A: 

Here's how I did the search on my blog (using Google)... don't remember where I got this template from originally but from the comments I guess it originally came from javascriptkit.com. :)

<script type="text/javascript">

// Google Internal Site Search script- By JavaScriptKit.com(http://www.javascriptkit.com) 
// For this and over 400+ free scripts, visit JavaScript Kit-http://www.javascriptkit.com/ 
// This notice must stay intact for use

//Enter domain of site to search. 
var domainroot="ericasberry.com"

function Gsitesearch(curobj) 
{ 
    curobj.q.value="site:"+domainroot+" "+curobj.qfront.value 
}

</script>


<form action="http://www.google.com/search" method="get"
    onSubmit="Gsitesearch(this)"&gt;

<p>Search ericasberry.com:<br /> 
<input name="q" type="hidden" /> 
<input name="qfront" type="text" style="width: 180px" /> 
<input type="submit" value="Search" /></p>

</form>
Eric Asberry
A: 

Google Ajax Search API

miceuz