ansaurus

Question

Where can I learn more about the Google search "did you mean" algorithm?

Answer 1

+14 A:

See your-favorite-natural-language-parser for examples of NLP parsers that you could use to build similar functionality with, also see how-to-correct-the-user-input-kind-of-google-did-you-mean

See other NLP questions by clicking on the NLP tag or searching for [NLP] in the search window.

oluies 2010-09-21 19:17:25

Answer 2

+89 A:

Travel to 1600 Amphitheater Parkway, Mountain View, California, United States. Knock on the door and ask very nicely.

Tyler McHenry 2010-09-21 19:18:01

They have doors?!

BoltClock 2010-09-21 19:19:19

They have windows too, but they don't like to admit it.

Tyler McHenry 2010-09-21 19:19:58

I heard they were trying to remove windows from their campus.

percent20 2010-09-21 19:20:56

You cannot emphasize the "very" enough...

wtaniguchi 2010-09-21 19:22:37

I hear that not only do they have doors and Windows, they have Gates too.

JUST MY correct OPINION 2010-09-22 06:12:37

Failing that, you can always buy Google for $165B

gAMBOOKa 2010-09-22 11:20:39

Use comments to crack jokes, not answers.

Roger Pate 2010-09-22 19:16:11

@Roger Normally I do, but this joke benefits from appearing as an answer. And when I posted this answer the question had four close votes and no legit answers. I never really expected that this question would get so popular rather than just be closed and forgotten five minutes later. I did make it CW, though, since I didn't want rep from a non-answer, and I voted up BrokenGlass' real answer when it showed up.

Tyler McHenry 2010-09-22 20:17:39

@Tyler: As the situation stands, your joke is detracting from the answers that actually try to answer the question. I don't see why this joke benefits from being an answer nor why your expectation of this question being closed and forgotten would affect posting a comment vs. answer, but given the situation as it is, will you do the right thing by deleting and reposting as a comment? You might've noticed everyone that used comments on the question to crack jokes (at around the same time you posted)... why do you think they didn't post answers?

Roger Pate 2010-09-22 20:38:24

come on delete it

oluies 2010-09-23 21:11:40

Answer 3

+97 A:

You should check out Peter Norvigs article about implementing the spell checker in a few lines of python: How to Write a Spelling Corrector It also has links for implementations in other languages (i.e. C#)

BrokenGlass 2010-09-21 19:19:46

Side fact: Peter Norvig is Director of Research at Google.

Gumbo 2010-09-22 06:19:48

This answer should be marked as accepted. Norvig's algorithm solves OP's problem, is pretty awesome, *and* it comes from Google. :)

ionut bizau 2010-09-23 03:32:26

Answer 4

+1 A:

AFAIK the "did you mean ?" feature doesn't check the spelling. It only gives you another query based on the content parsed by google.

Colin Hebert 2010-09-21 19:20:42

No, it guesses alternatives based on misspellings. If you search for "katie sachoff" it comes up with "Did you mean katee sackhoff?"

ebneter 2010-09-21 22:30:15

I recently read an article in which a Google employee expounded upon how they have the world's most advanced spellchecker, since it will take the context of a word into account in ways that few others do.

Alex JL 2010-09-22 03:50:05

@Alex JL- And they're probably right.

DMan 2010-09-22 04:30:39

@Alex JL, But it still doesn't check anything. Proposing an alternative with a common mistake, or a really similar but more common word isn't a spelling correction it's just guessing.

Colin Hebert 2010-09-22 05:39:36

@Colin Not sure what you mean - isn't that what every spell checker does? Detect a mispelled word, and use heuristics to guess what you mean instead? I mean, I misspelled 'misspelled' and Firefox is suggesting misspelled, dispelled, respelled, etc. It's not like they're artificial intelligence or something. I agree with Google that theirs works very well.

Alex JL 2010-09-22 07:30:49

@Alex JL, I mean that if "everybody" write a word the wrong way, google will find the wrong word more important that the good one. It's isn't based on spelling rules but on most common use.

Colin Hebert 2010-09-22 16:38:12

@Alex JL, for example (in french) the word "Obtue" is a common mistake, the correct spelling is "Obtuse", but as the mistake is really common, Google won't say anything about this word. Or in english if you search for "alterior" instead of "ulterior" it's considered as okay because it's used frequently.

Colin Hebert 2010-09-22 17:27:10

Answer 5

+24 A:

I attended a seminar by a Google engineer a year and a half ago, where they talked about their approach to this. The presenter was saying that (at least part of) their algorithm has little intelligence at all; but rather, utilises the huge amounts of data they have access to. They determined that if someone searches for "Brittany Speares", clicks on nothing, and then does another search for "Britney Spears", and clicks on something, we can have a fair guess about what they were searching for, and can suggest that in future.

Disclaimer: This may have just been part of their algorithm

Smashery 2010-09-22 04:23:39

RE Disclaimer: I assume it was/is. It's a very safe way to go about it. I couldn't imagine anybody coming up with an algorithm that searches a database full of english words, then trying to determine whether or not the query is similar to existing data.

lucifer 2010-09-30 18:39:37

Answer 6

+2 A:

I'd take a look at this article on google bombing. It shows that it just suggests answers based off previously entered results.

Kyle 2010-09-22 04:28:20

Yes, I think it learns from what other people had corrected certain searches to. For instance if you search for 'hunrgy man dinner' and then click on nothing and change it to 'hungry man dinner', Google takes note of that next time it gets the first search. I'm sure they have more tricks than that, too, such as a traditional spellcheck in there somewhere.

Mark Snidovich 2010-09-22 04:56:45

Answer 7

+5 A:

You can use http://developer.yahoo.com/search/web/V1/spellingSuggestion.html which would give a similar functionality.

Oligoglot 2010-09-22 05:01:20

Answer 8

+3 A:

You can check out the source code for Xapian which provides this functionality, as do a lot of other search libraries. http://xapian.org/

m7d 2010-09-22 05:07:11

Answer 9

+14 A:

Python has a module called difflib. It provides a functionality called get_close_matches. From the Python Documentation:

get_close_matches(word, possibilities[, n][, cutoff])

Return a list of the best "good enough" matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).

Optional argument n (default 3) is the maximum number of close matches to return; n must be greater than 0.

Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don't score at least that similar to word are ignored.

The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.

  >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
  ['apple', 'ape']
  >>> import keyword
  >>> get_close_matches('wheel', keyword.kwlist)
  ['while']
  >>> get_close_matches('apple', keyword.kwlist)
  []
  >>> get_close_matches('accept', keyword.kwlist)
  ['except']

Could this library help you?

Arrieta 2010-09-22 05:08:46

I didn't know about this... This is awesome!! :)

st0le 2010-09-23 06:55:37

Answer 10

+3 A:

I am not sure if it serves your purpose but a String Edit distance Algorithm with a dictionary might suffice for a small Application.

Geek 2010-09-22 05:58:01

Answer 11

+1 A:

A great chapter to this topic can be found in the openly available Introduction to Information Retrieval.

Fabian Steeg 2010-09-23 10:32:37

Answer 12

A:

U could use ngram for the comparisment: http://en.wikipedia.org/wiki/N-gram

Using python ngram module: http://packages.python.org/ngram/index.html

import ngram

G2 = ngram.NGram([  "iis7 configure ftp 7.5",
                    "ubunto configre 8.5",
                    "mac configure ftp"])

print "String", "\t", "Similarity"
for i in G2.search("iis7 configurftp 7.5", threshold=0.1):
    print i[0], "\t", i[1]

U get:

>>> 
String  Similarity
"iis7 configure ftp 7.5"    0.76
"mac configure ftp  0.24"
"ubunto configre 8.5"   0.19

hugo24 2010-09-23 11:31:01

An N-Gram index is the only sound solution I've seen among the answers, why is this tumbled down? Well... aside of Peter Norvig's. But N-Grams can do it quite good.

mrrtnn 2010-10-07 01:28:48

Thank u :) N-Grams are the prefered way at google... as far as i know.

hugo24 2010-10-07 13:12:43

Answer 13

A:

take a look at Levenshtein-Automata

Francis 2010-09-23 16:39:26

ansaurus

tags:

views:

answers:

Where can I learn more about the Google search "did you mean" algorithm?

related questions