views:

167

answers:

7

How do you do so that when you search for "alien vs predator" you also get results with the string "alienS vs predator" with the "S"
example http://www.torrentz.com/search?q=alien+vs+predator
how have they implemented this?
is this advanced search engine stuff?

A: 
Alexander Sagen
I believe @LenaNicolr is asking about how it searches on different versions of a word in the search phrase, i.e. the plural "aliens" vice the original term "alien".
jball
I don't think he's talking about case sensitivity. He's put the `S` in caps because searching for `alien vs predator` returns `aliens vs predator` in its results. Edit: blast you jball!
CanSpice
Yup, sorry about that, thanks for letting me know.
Alexander Sagen
A: 

This is a basic feature of a search engine, rather than just a program that matches your query with a set of pre-defined results.

If you have the time, this is a great read, all about different algorithms, and how they are implemented.

Greg
+4  A: 

Checking for plurals is a form of stemming. Stemming is a common feature of search engines and other text matching. See the wikipedia page: http://en.wikipedia.org/wiki/Stemming for a host of algorithms to perform stemming.

Scott Stafford
Why -1, silent assassin?
Scott Stafford
+6  A: 

This is known as Word Stemming. When the text is indexed, words are "stemmed" to their "roots". So fighting becomes fight, skiing becomes ski, runs becomes run, etc. The same thing is done to the text that a user enters at search time, so when the search terms are compared to the values in the index, they match.

The Lucene project supports this. I wouldn't consider it an advanced feature. Especially with the expectations that Google has set.

Ryan Ische
+2  A: 

Typically when one sets up a search engine to search for text, one will construct a query that's something like:

SELECT * FROM TBLMOVIES WHERE NAME LIKE '%ALIEN%'

This means that the substring ALIEN can appear anywhere in the NAME field, so you'll get back strings like ALIENS.

CanSpice
+3  A: 

When words are indexed they are indexed by root form. For example for "aliens", "alien", "alien's", "aliens'" are all stored as "alien".

And when words are search search engine also searches only the root form "alien".

This is often called as Porter Stemming Algorithm. You can download its realization for your favorite language here - http://tartarus.org/~martin/PorterStemmer/

how
A: 

You could try using soundex() as a fuzzy match on your strings. If you save the soundex with the title then compare that index vs a substring using LIKE 'XXX%' you should have a decent match. The higher the substring count the closer they will match.

see docs: http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_soundex

AutoSponge