views:

53

answers:

2

Hi there, I've done some Google searching but couldn't find what I was looking for.

I'm developing a scrabble-type word game in rails, and was wondering if there was a simple way to validate what the player inputs in the game is actually a word. They'd be typing the word out.

Is validation against some sort of English language dictionary database loaded within the app best way to solve this problem? If so, are there any libraries that offer this kind of functionality? If not, what would you suggest?

Thanks for your help!

A: 

A piece of language-agnostic advice here, is that if you only care about the existence of a word (which in such a case, you do), and you are planning to load the entire database into the application (which your query suggests you're considering) then a DAWG will enable you to check the existence in O(n) time complexity where n is the size of the word (dictionary size has no effect - overall the lookup is essentially O(1)), while being a relatively minimal structure in terms of memory (indeed, some insertions will actually reduce the size of the structure, a DAWG for "top, tap, taps, tops" has fewer nodes than one for "tops, tap").

Jon Hanna
For ruby, you'd probably want to use a `Set`: an array would take a time linearly proportional to the dictionary size, and if you used a hash, you'd use the keys but not the values.
Andrew Grimm
Unless the dictionary is very small, the even a poor and interpretted DAWG will beat a hash-based Set. DAWG is is essentially O(1) in terms of dictionary size, only word-size affects it (hash-creation tends to also be affected by word size), but with all other factors concerning the algorithms DAWG comes out better. DAWG is a pretty normal structure for large sets of strings (words, DNA sequences, etc).
Jon Hanna
A: 

You need two things:

  1. a word list
  2. some code

The word list is the tricky part. On most Unix systems there's a word list at /usr/share/dict/words or /usr/dict/words -- see http://en.wikipedia.org/wiki/Words_(Unix) for more details. The one on my Mac has 234,936 words in it. But they're not all valid Scrabble words. So you'd have to somehow acquire a Scrabble dictionary, make sure you have the right license to use it, and process it so it's a text file.

The code is no problem in the simple case. Here's a script I whipped up just now:

words = {}
File.open("/usr/share/dict/words") do |file|
  file.each do |line|
    words[line.strip] = true
  end
end
p words["magic"]
p words["saldkaj"]

This will output

true
nil

I leave it as an exercise for the reader to make it into a proper Words object. (Technically it's not a Dictionary since it has no definitions.) Or to use a DAWG instead of a hash, even though a hash is probably fine for your needs.

Alex Chaffee
Well, it's a scrabble variant type game, so any reputable (webster's et al) dictionary would probably be fine. Would I just wrap the code in a validator function and substitute "usr/share/dict/words" (or the source of where I'm getting the words) with a dictionary API?
dartfrog