tags:

views:

114

answers:

3

As the title says, I need a search engine... for mysql searching. My website is PHP based.

I was going with sphinx but my hosting company doesn't support full-text indexes!

So a search engine to be used without full-text!

It should be pretty powerful, and must include atleast these functions below:

  • When searching for 'bmw 520' only matches where these two words come in exactly this order is returned. not matches for only 'bmw' or only '520'.

  • When searching for 'bmw 330ci' results as the above will be returned, but, WITH AND WITHOUT the ci extension. There are a nr of extensions in cars as you all know (i, ci, si, fi etc).

  • I want the 'minus sign' to 'exclude' all returns containing the word after the sign, ex: 'bmw -330' will return all 'bmw' results without the '330' ones. (a NOT instead of minus sign is also ok)

  • all special character accents like 'é' are converted to their simple values, in this case 'e'.

  • list of words to ignore completely in the search

Thanks guys!

+6  A: 

The Zend_Lucene search competent works fairly well. I am not sure how it would cope with your second requirement, however if you customized the tokenized you should be able to do it by treating a change from letters to numbers as a new word.

The one I am really not sure about is the top requirement. Given how it is indexed, order becomes irreverent in the search, so you may not be able to do it without heavy editing of Lucene, writing a filter (using lucene to pull the matches, then checking the order), or writing your own solution. All of these will slow the search down, and add load to your server.

There is also solr, but I have never used it and don't know anything about it. Sphinx was another one, but I see you have already ruled that out.

Yacoby
Since you beat me to it, i deleted my answer and add the usage example here: http://dev.juokaz.com/php/starting-with-zend%5Fsearch%5Flucene
Gordon
A: 

Xapian is very good (very comprehensive) if you have the time for the initial setup.

It functions as you would expect a search engine to work, tell the indexer what bits of information to index under what namespace/table/object (Page, Profile, Products etc), then issue a query for your users based on keywords, it also supports google style tags e.g. "profile:Mark icecream" would search my profile for the word icecream, i seem to remember it supporting ranges too for data you specify as numeric.

Can be used in local mode which can offer spelling modifications (Did you mean?), or remote mode that many sites can index to and query from.

What really saved me one time was the ability to attach transient non searchable data to an indexed item, e.g. attaching the DB id to all data indexed for that record, very good for then going and getting the whole record from the DB when your matches come back from xapian.

Question Mark
A: 

I have used a couple of Search Engines on my site during it's time, but in the next rebuild I'm planning to move to Google Site Search.

There are several reasons for this:

  • Users are very familiar with the Google style of search result listings which improves usability and hence click-through rates
  • The Google engine is very good at guessing when to use the page description and when to use a fragment of the page (it also very good at getting relevant fragments compared to some other engines)
  • It's used by thousands of very popular websites
  • Google is the most popular search engine around so you know their technology is both reliable and accurate

Google Site Search begins at $100 per annum for 1000 pages or less (and a limit on queries) or you can use the free Google Custom Search Engine (but this has much less customizability)

Joseph Earl