tags:

views:

217

answers:

6

Hi. Basically, what I'm looking for is some kind of class or method to implement a dictionary in PHP. For example, if I was building a word unscrambler - lets say I used the letters 'a,e,l,p,p'. The number of possibilities for arrangement is huge - how do I only display those which are actual words (apple, pale etc )?

Thanks!

A: 

i prefer to solve such troubles using some kind of rdbms.

just create temporary table with one column and 5 rows (1 per each letter) and then select with multiple joins (count of joins == number of letters) without ON statement.

in this case you will get dekart multiplication which equals all combinations ;-)

zerkms
umm, what?15 char
Rohan
-1: completely impractical for words of more than 6 or 7 letters.
Michael Borgwardt
sorry, thought you need just all the combinations, not real words.
zerkms
@Michael Borgwardt: i was reading wrong the task, yep
zerkms
Yes, that exactly is the problem. There are 170.581.728.179.578.208.256 combinations of 6 letters - recurring letters reduce that, but it's still beyond madness.
Michael Borgwardt
A: 

Store a list of words in a file or a database, and then just try all the combinations. You could also consider the likely position of vowels vs consonants to potentially speed it up. Rather than making your own word list, you could use something like WordNet.

Stephen Cross
interesting. How would I use WordNet with PHP?
Rohan
It'd be nice if someone gave a reason for voting this down. Anyway, in reply: http://wordnet.princeton.edu/wordnet/related-projects/#PHP
Stephen Cross
+3  A: 

Ah, and the another answer:

If you just want to get all real words - then find any big dictionary. then store it in manner of:

word | hash

where word is word itself and hash is sorted alphabetically letters:

for apple hash will be: aelpp or aelp2

then for given letters traverse all combinations using the same algo for hashing and search through this table.

zerkms
"hash" is the wrong word. "key" would be better - as in use it as a key in a hashtable.
Michael Borgwardt
agreed, "key" is more relevant here
zerkms
My question is, where do I get this big dictionary?
Rohan
for first - you can parse aspell/ispell dictionaries. we don't know how large enough should be dictionaries for you.
zerkms
I've used this method before. It worked very well since all I wanted to do was unscramble to known words in my dictionary.
chris
+1  A: 

Classically word lookup problems can be efficiently solved using a Trie.

I would suggest finding a word list, say, from WordNet, store it in a Trie, and then perform fast lookups of possible words.

A solution would be of the form:

  1. load the word list
  2. store the word list in a trie
  3. accept input for a word to unscramble
  4. try permutations i=1..N

    a. lookup permutation i using the trie

    b. if there's a positive result, store this for display

    c. iterate (i++)

  5. repeat from 3.

edit:

A side note here is that for any N length character word there could be N! required lookups (for 7 characters that would be 5040). You should consider making some optimizations to the trie lookup algorithm. For instance, you gain substantial efficiency by ruling out invalid substrings early, and not repeating end permutations.

e.g. given the word apple, if you had the permutation where you selected "ppl" as the first three characters, no word will be found. So, no matter how you permute the a and the e at the end you cannot construct a word. Early termination of permutations may be important to your algorithm's efficiency.

Mark E
Thanks. This makes sense =)
Rohan
This doesn't help with scrambled words, though. You first have to normalize them as in zerkms' answer
Michael Borgwardt
@Michael, no, you can simply try all permutations. Since the Trie lookups will be incredibly fast the penalty for searching multiple times will be low; granted for long strings there can be a large number of permutations, and this solution won't make sense with words much larger than say, 7 characters
Mark E
+1  A: 

you can also consider pspell

http://php.net/manual/en/book.pspell.php

$ps = pspell_new("en");
foreach(array('alppe', 'plape', 'apple') as $word)
   if(pspell_check($ps, $word))
      echo $word;
stereofrog
A: 

I actually like zerkms's solution better but here's another one

create 2 tables

words
-----
word_id (primary key)
word


letter_index
-----
letter (idx)
word_id (idx)

When you add a word to the words table you have to add an entry to the letter_index for each unique letter. letter_index has a primary key based on both the letter and the word_id.
To find words comprising of a group of letters you create a query something like:

SELECT word FROM words w
// for each letter in the search
INNER JOIN letter_index i ON ( w.word_id = i.word_id AND i.letter = letter_1 )
INNER JOIN letter_index i ON ( w.word_id = i.word_id AND i.letter = letter_2 )
INNER JOIN letter_index i ON ( w.word_id = i.word_id AND i.letter = letter_3 )
...
INNER JOIN letter_index i ON ( w.word_id = i.word_id AND i.letter = letter_n )
meouw