+14  A: 

See your-favorite-natural-language-parser for examples of NLP parsers that you could use to build similar functionality with, also see how-to-correct-the-user-input-kind-of-google-did-you-mean

See other NLP questions by clicking on the NLP tag or searching for [NLP] in the search window.

oluies
+89  A: 

Travel to 1600 Amphitheater Parkway, Mountain View, California, United States. Knock on the door and ask very nicely.

Tyler McHenry
They have doors?!
BoltClock
They have windows too, but they don't like to admit it.
Tyler McHenry
I heard they were trying to remove windows from their campus.
percent20
You cannot emphasize the "very" enough...
wtaniguchi
I hear that not only do they have doors and Windows, they have Gates too.
JUST MY correct OPINION
Failing that, you can always buy Google for $165B
gAMBOOKa
Use comments to crack jokes, not answers.
Roger Pate
@Roger Normally I do, but this joke benefits from appearing as an answer. And when I posted this answer the question had four close votes and no legit answers. I never really expected that this question would get so popular rather than just be closed and forgotten five minutes later. I did make it CW, though, since I didn't want rep from a non-answer, and I voted up BrokenGlass' real answer when it showed up.
Tyler McHenry
@Tyler: As the situation stands, your joke is detracting from the answers that actually try to answer the question. I don't see why this joke benefits from being an answer nor why your expectation of this question being closed and forgotten would affect posting a comment vs. answer, but given the situation as it is, will you do the right thing by deleting and reposting as a comment? You might've noticed everyone that used comments on the question to crack jokes (at around the same time you posted)... why do you think they didn't post answers?
Roger Pate
come on delete it
oluies
+97  A: 

You should check out Peter Norvigs article about implementing the spell checker in a few lines of python: How to Write a Spelling Corrector It also has links for implementations in other languages (i.e. C#)

BrokenGlass
Side fact: Peter Norvig is Director of Research at Google.
Gumbo
This answer should be marked as accepted. Norvig's algorithm solves OP's problem, is pretty awesome, *and* it comes from Google. :)
ionut bizau
+1  A: 

AFAIK the "did you mean ?" feature doesn't check the spelling. It only gives you another query based on the content parsed by google.

Colin Hebert
No, it guesses alternatives based on misspellings. If you search for "katie sachoff" it comes up with "Did you mean katee sackhoff?"
ebneter
I recently read an article in which a Google employee expounded upon how they have the world's most advanced spellchecker, since it will take the context of a word into account in ways that few others do.
Alex JL
@Alex JL- And they're probably right.
DMan
@Alex JL, But it still doesn't check anything. Proposing an alternative with a common mistake, or a really similar but more common word isn't a spelling correction it's just guessing.
Colin Hebert
@Colin Not sure what you mean - isn't that what every spell checker does? Detect a mispelled word, and use heuristics to guess what you mean instead? I mean, I misspelled 'misspelled' and Firefox is suggesting misspelled, dispelled, respelled, etc. It's not like they're artificial intelligence or something. I agree with Google that theirs works very well.
Alex JL
@Alex JL, I mean that if "everybody" write a word the wrong way, google will find the wrong word more important that the good one. It's isn't based on spelling rules but on most common use.
Colin Hebert
@Alex JL, for example (in french) the word "Obtue" is a common mistake, the correct spelling is "Obtuse", but as the mistake is really common, Google won't say anything about this word. Or in english if you search for "alterior" instead of "ulterior" it's considered as okay because it's used frequently.
Colin Hebert
+24  A: 

I attended a seminar by a Google engineer a year and a half ago, where they talked about their approach to this. The presenter was saying that (at least part of) their algorithm has little intelligence at all; but rather, utilises the huge amounts of data they have access to. They determined that if someone searches for "Brittany Speares", clicks on nothing, and then does another search for "Britney Spears", and clicks on something, we can have a fair guess about what they were searching for, and can suggest that in future.

Disclaimer: This may have just been part of their algorithm

Smashery
RE Disclaimer: I assume it was/is. It's a very safe way to go about it. I couldn't imagine anybody coming up with an algorithm that searches a database full of english words, then trying to determine whether or not the query is similar to existing data.
lucifer
+2  A: 

I'd take a look at this article on google bombing. It shows that it just suggests answers based off previously entered results.

Kyle
Yes, I think it learns from what other people had corrected certain searches to. For instance if you search for 'hunrgy man dinner' and then click on nothing and change it to 'hungry man dinner', Google takes note of that next time it gets the first search. I'm sure they have more tricks than that, too, such as a traditional spellcheck in there somewhere.
Mark Snidovich
+5  A: 

You can use http://developer.yahoo.com/search/web/V1/spellingSuggestion.html which would give a similar functionality.

Oligoglot
+3  A: 

You can check out the source code for Xapian which provides this functionality, as do a lot of other search libraries. http://xapian.org/

m7d
+14  A: 

Python has a module called difflib. It provides a functionality called get_close_matches. From the Python Documentation:

get_close_matches(word, possibilities[, n][, cutoff])

Return a list of the best "good enough" matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).

Optional argument n (default 3) is the maximum number of close matches to return; n must be greater than 0.

Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don't score at least that similar to word are ignored.

The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.

  >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
  ['apple', 'ape']
  >>> import keyword
  >>> get_close_matches('wheel', keyword.kwlist)
  ['while']
  >>> get_close_matches('apple', keyword.kwlist)
  []
  >>> get_close_matches('accept', keyword.kwlist)
  ['except']

Could this library help you?

Arrieta
I didn't know about this... This is awesome!! :)
st0le
+3  A: 

I am not sure if it serves your purpose but a String Edit distance Algorithm with a dictionary might suffice for a small Application.

Geek
+1  A: 

A great chapter to this topic can be found in the openly available Introduction to Information Retrieval.

Fabian Steeg
A: 

U could use ngram for the comparisment: http://en.wikipedia.org/wiki/N-gram

Using python ngram module: http://packages.python.org/ngram/index.html

import ngram

G2 = ngram.NGram([  "iis7 configure ftp 7.5",
                    "ubunto configre 8.5",
                    "mac configure ftp"])

print "String", "\t", "Similarity"
for i in G2.search("iis7 configurftp 7.5", threshold=0.1):
    print i[0], "\t", i[1]

U get:

>>> 
String  Similarity
"iis7 configure ftp 7.5"    0.76
"mac configure ftp  0.24"
"ubunto configre 8.5"   0.19
hugo24
An N-Gram index is the only sound solution I've seen among the answers, why is this tumbled down? Well... aside of Peter Norvig's. But N-Grams can do it quite good.
mrrtnn
Thank u :) N-Grams are the prefered way at google... as far as i know.
hugo24
A: 

take a look at Levenshtein-Automata

Francis