views:

88

answers:

3

I'm developing: http://www.buscatiendas.com.mx

I've seen people entering text for queries with lots of typos. What kind of search could i implement so similar words are found? Like google does more or less would be neat.

I'm using SQL Server Full Text search.

A: 

There are two ways to solve this:

  1. Buy a 3rd party product, like a google search applicance, or one of Microsoft search servers.

  2. Log all queries, and have someone review these, making a table which links the bad queries to what they should be. (It's possible you could buy a component library which does this, much like a spelling checker.)

Bravax
A: 

if you want to roll out your own, first u need to filter out noise words before u even start searching because this may just impose load on your database unnecessarily. should "a good book" be the same as searching for "the good book" or "his good book" or "good and bad reviews on a book"? so obviously, "a", "the", "an", "and", etc. do not at at all qualify as "useful" search keywords. once u got the "noise" filtered out, then u start the real searching. again, u should consider database performance. is it wise to search a dynamic database or a pre-precessed database? figure out a way to filter out the noise words in the search data too.

stillstanding
+1  A: 

Why don't you have google/bing index it for you and just use that using the site: feature provided by them?

If that is not an option, you might have to have one of your own 'spell checkers' (either implement yourself or just use an existing one), which is trained on the data you have. Note spell checking is not deterministic (for eg: latel, is it label? later?). You can only make a 'best' guess based on the actual data you have in your site.

There are probabilistic models where you can 'train' your spell guesser/checker to come up with the a 'best' guess.

The following page seems pretty useful. It has a description on how to write one yourself, and also has good links (including a survey paper) and links to implementations in different languages:

http://norvig.com/spell-correct.html.

Moron