Hi,
I'm looking for an open-source web search library that does not use a search index file. Do you know any?
Thanks, Kenneth
Hi,
I'm looking for an open-source web search library that does not use a search index file. Do you know any?
Thanks, Kenneth
The original poster clarified in a comment to this reply that what he is looking for is essentially "greplike search but through HTTP", and mentioned that he is looking for something that uses little disk as he's working with an embedded system.
I am not aware of any related projects, but you might want to look at html parsers and xquery implementations in your language of choice. You should be able to take care of "real-life" messiness of html with the former, and write a search that's almost as detailed as you might desire with the latter.
I assume that you will be working with a set of urls that will either be provided, or already stored locally, since the idea of actually crawling the whole web, discovering links, etc, in an embedded device is thoroughly unrealistic.
Although with a good html/xquery implementation, you do have the tools to extract all the links..
My original answer, which was really a request for clarification:
Not sure what you mean. How do you picture a search working without an index? Crawling the web for every query? Piping through to google? Or are you referring to a specific kind of search index file that you are trying to avoid?
You mean:
search.cgi
#/bin/sh
arg=`echo $QUERY | sed -e 's/^s=//' -e 's/&.*$//'`
cd /var/www/httpd
find . -type f | xargs egrep -l "$arg" | awk 'BEGIN {
print "Content-type: text/html";
print "";
print "<HTML><HEAD><TITLE>Search Result</TITLE></HEAD>";
print "<BODY><P>Here are your search results, sorry it took so long.</P>";
print "<UL>";
}
{ print "<LI><A HREF=\"http://yourhost.com/" $1 "\">" $1 "</A></LI>"; }
END {
print "</UL></BODY>";
}'
Untested...
I guess there is none (at least that is popular enough for users here to be aware of).
We've went ahead to code our own Search system.