views:

1280

answers:

6

Best way to implement a

  • dictionary (is their any DS better than Trie for Dictionary)
  • thesaurus (no idea, as match is made on meanings of the words, similar meanings)
  • spell checker (something better than a hash map), if possible with correct spelling recommendations.

when asked in an 1-hr interviews, are we expected to write a c/c++ code, for the algorithm.

+5  A: 

See this for a 21-line Python 2.5 spelling corrector and a bit of background.

Anton Gogolev
Wow great, there is a facebook puzzle I struggled so much with: http://www.facebook.com/careers/puzzles.php#/careers/puzzles.php?puzzle_id=17 that would be just trivial now...
Matthieu M.
+1  A: 

Also check out the Bloom filter.

Anonymous
You've been a user for a year and haven't picked a name yet??
Ether
+1  A: 

For a dictionary I would use the std::map (calling Dictionary in the .Net framework) collection with the word as key and a custom object (with all information about the word + the definition) as value.

For a thesaurus the best structure is a tree where each node is a section and where each branch finished with an object which contains all information about what you have to display.

Patrice Bernassola
+1  A: 

I can see no better data structure than a trie for the dictionary and the thesaurus. Both can be fitted in one structure if needed with one link in the node pointing to the meaning of the word (dictionary) and one to synonyms (thesaurus). It can even offer some form of autocompletion (when there's only one link in the node).

Spelling corrector is a bit trickier - since one has to map fals input to some kind of correct input. You can take this link as a start: http://en.wikipedia.org/wiki/Spell_checker. At the end you'll find links to papers about different algorithms. According to the wikipedia article, this paper describes the most successfull algorithm: Andrew Golding and Dan Roth's "Winnow-based spelling correction algorithm"

Tobias Langner
A: 

In all three cases, you can construct a BK-Tree from your word set. BK-Trees let you find all words within a given edit distance of the entered word. See my blog post on BK-Trees for an explanation of how they work.

A dictionary and spell checker are more or less identical - the dictionary simply needs to provide definitions along with the words. For a thesaurus, words are clustered into 'synsets' - synonym sets - with all the elements inserted into the BK-Tree. When someone looks up one word in the synset, you display all the other ones as alternatives. A word can appear in multiple synsets, so you need to ensure your BK-Tree nodes can handle multiple values for a given key.

Nick Johnson
+2  A: 

For the dictionary, there is indeed a data structure superior to the trie. Try a DAWG, or CDAWG: http://en.wikipedia.org/wiki/Directed_acyclic_word_graph. Just to complicate matters, my favorite paper on the structure, Ciura and Deorowicz's "How to Squeeze a Lexicon" calls them "minimal ADFAs". Google around and you'll find many competing algorithms for constructing these beasts. Good luck!

Grynszpan
That is a very interesting DS. Thanks for resurrecting the question to share it.
hemp