views:

1051

answers:

4

Hello,

I am trying to implement an internal search for my website that can point users in the right direction in case the mistype a word, something like the did you mean : in google search.

Does anybody have an idea how such a search can be done? How can we establish the relevance of the word or the phrase we assume the user intended to search for?

  • i use asp.net and sql server 2005 with FTS (fullTextSearch)

Thank you

A: 

The simplest approach I can think of is to write a function that returns the degree of mismatch between two words, and you loop through all the words and find the best ones.

I've done this with a branch-and-bound method. Let me dig up the code:

bool matchWithinBound(char* a, char* b, int bound){
  // skip over matching characters
  while(*a && *b && *a == *b){a++; b++;}
  if (*a==0 && *b==0) return true;
  // if bound too low, quit
  if (bound <= 0) return false;
  // try assuming a has an extra character
  if (*a && matchWithinBound(a+1, b, bound-1)) return true;
  // try assuming a had a letter deleted
  if (*b && matchWithinBound(a, b+1, bound-1)) return true;
  // try assuming a had a letter replaced
  if (*a && *b && matchWithinBound(a+1, b+1, bound-1)) return true;
  // try assuming a had two adjacent letters swapped
  if (a[0] && a[1]){
    char temp;
    int success;
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    success = matchWithinBounds(a, b, bound-1);
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    if (success) return true;
  }
  // can try other modifications
  return false;
}

int DistanceBetweenWords(char* a, char* b){
  int bound = 0;
  for (bound = 0; bound < 10; bound++){
    if (matchWithinBounds(a, b, bound)) return bound;
  }
  return 1000;
}
Mike Dunlavey
+2  A: 

This is done querying through regular expression the closest keywords that match the phrase.

Here is a great article that might help you.

Igor Zelaya
Indeed a very good article. +1
0xA3
+1 for the article. But I think it's not what's asked for. =) The functionality in question is more like "Did you mean Jon Skeet?" when someone searches for "guru".
PEZ
hahahaha....for that you need AI.
Igor Zelaya
I picked up that joke from http://stackoverflow.com/questions/305223#306973. What I mean is that it's different doing completion-as-you-type and correcting spelling.
PEZ
it is indeed a good article - Unfortunately, if the user enters the wrong character, how would you know? Let's say a user enters "Skatch" instead of "Sketch", does some search engine run algorithms like the Levenstein distance one to figure the closest match?
zaladane
+3  A: 

You could use an algorithm for determining string similarity and then suggest other string from your search index up to a certain difference.

One of these algorithms is the Levenshtein distance.

However, don't forget searching for existing solutions. I think e.g. Lucene has the capability to search for similar strings.

Btw, here's a related post on this topic: How does the Google “Did you mean?” Algorithm work?

0xA3
I didn't even know such algorithm existed !!!
zaladane
A: 

With T-SQL You can use the SOUNDEX function to compare words phonetically.

If you take the users input and then compare it with other words in your database by soundex code, you should be able to come up with a list of 'do you mean'? words.

E.g.

select SOUNDEX('andrew')
select SOUNDEX('androo')

will both produce the same output (A536).

There are better algorithms these days, but soundex is built into sql server.

Andrew Rollings