tags:

views:

568

answers:

5

I'm making algorithm like crossword, but I dont know how to design the algorithm.

For example:

  • there are words like 'car', 'apple' in the dictionary.
  • the word 'app' is given on the board.
  • there are letters like 'l' 'e' 'c' 'r'....for making words.

So the algorithm's task is to make correct words which are stored in dictionary.

app -> lapp -> leapp -> lecapp -> .... -> lappe -> eappc -> ... -> appl -> apple (correct answer)

What is the best solution for this algorithm?

+1  A: 

Store your dictionary as a tree, for example:

          *
          |
     +----+----+
     |         |
     A         B
     |         |
  +--+--+      E
  |     |      |
  P     S    +-+-+
  |     |    |   |
  P     K    A   E
  |          |
+-+-+      +-+-+
|   |      |   |
E   L      D   N
|   |
A   E
|
L

Thanks paxdiablo for making my tree more readable.

This tree has the words appeal, apple, ask, bead, bean, be, and bee. Mark nodes saying "If I were to stop here, this would be a valid word" such as the 'e' below 'b' for 'be'.

When you find a letter you don't know, use a wildcard (i.e., pick all children and recurse down all paths).

You said crossword, but then your "letters...for making words" seemed to indicate Scrabble. This would work for either. Not the fastest, but plenty fast.

Thanks Andreas for reminding us that this is called a trie.

If you wanted to say "the second letter is a P" you would start from the root node and take every branch (which would be every letter in the alphabet, assuming it's a proper dictionary) and then the "P" branch and then go on from there.

Zwergner
How do you search this dictionary with restrictions on non-primary letters? e.g. the second letter must be a P?
Kirk Broadhurst
This is called a "trie" or prefix tree.
Andreas Brinck
+2  A: 

I've actually written a crossword program before (cryptic but the theory behind the construction is identical).

I had a database of words and their clues which could be sorted by times used (so that I wouldn't tend to get duplicate crosswords on subsequent runs).

The first thing you should do is design your patterns (blacks where you can't put letters and whites where you can). Trying to fit words into a grid while creating the pattern on the fly is very time consuming and prone to errors. If you look at most crosswords, they tend to follow certain rules to make it easier. Things like being symmetrical around one of the diagonals and disallowing a square of four white cells (to ease the task of selecting suitable words).

Once you have the pattern, then you start finding words to place in it. That way, you would know that "app" was the start of the word and be able to limit your searches to those starting with "app", not every word that has "app" in it. Similarly for words where you have known letters at any positions. It's a lot easier to locate words with letters at known positions than to evaluate those letters at any starting position within a word.

Mine ended up being written in shell script (believe it or not) and using the dictionary that came from Linux as a word search tool. If you know you have a 5-letter word starting with "app", it's quite easy to use:

grep '^app..$' words.txt

to get a list of all valid possibilities.

And, as each word was found, it was copied to a clues.txt file that contained both the word and multiple possible clues. The actual format was use {count, word, clue} where the same word may exist on multiple lines with a different clue - this allowed piping of the grep through sort so that lesser used words/clues floated to the top (whenever a word/clue was used, its count was incremented, making it less likely to be used next time).

Once that file was a decent size, the program would use it first to locate words and, only if one wasn't found would it revert to the words file (sans clues) where manual intervention would be required.

It actually ended up being quite good at doing the job. It wasn't blindingly fast but I didn't need to generate one every three seconds - this was for a community newsletter sent out once a week.


Now that you've changed the question to a Scrabble variant, that's actually much harder.

You need to take into account the letters you have, the letters on the board and the fact that there's a lot more places that you have to evaluate. This makes a brute force method much harder.

What I would do as an initial cut would be to select possibilities (starting position and direction on the board) chosen randomly, then use the same algorithm as for the crossword variant above to locate all words that can fit there. Then, if you have the letters to satisfy that word, store it (along with its score) in a list.

Keep in mind that you need to watch out for interfering with other words on the board.

I would continue to examine possibilities until one of:

  • your list is big enough to choose from.
  • you run out of time.
  • you've examined enough possibilities to satisfy your competency level.

That last one's important - if you're playing a beginner, you don't want to exhaustively examine millions of possibilities.

Then, choose the best move from your list (or maybe not the best if playing at beginner level - it all depends on how good you want the computer to be).

paxdiablo
A: 

If you're trying to create an index of words such that you could attempt to "solve" (or create) crosswords then I guess you'd start with a dictionary of words indexed by length. Then you'd create another dictionary of dictionaries of dictionaries... the first index is by total word length while the second one is by length, then by letter position, and finally by letter (six letter words with a second letter of "i" for example).

After you've built this index you can then express each step in trying to set or solve a puzzle in terms of set operations performed on these. (For example a 8 letter word starting with "w" and ending with "k" would be the intersection of all 8 letter words starting with "w" and all those ending with "k" --- which, unsurprisingly would include "homework"). Having built the indexed data structure I described, of course, allow for much more efficient searches for possible matches then would be possible by performing linear scans of just the global word list or even linear scans of the length separated lists).

Once you have this basic data structure then the rest of the program would, presumably, be a tree generation and traversal (with backtracking of course). Create a program that generates every possibility (using the data structure described) and a whenever it "gets stuck" have it backtrack until it finds new possibilities.

As implied by paxdiablo, you'll have to include a large pool of "words" for the generator to have a reasonable chance of creating a completed "solution." Anyone who is experienced with crosswords realizes that they allow the setter to take quite a few liberties (such as frequent use of compass points, archaic terms and poetic contracts) in order to get themselves o'er the hump as it were.

I haven't personally written a crossword generator. I have written cryptogram solvers which used a similar, though much simpler indexing structure. (To find every word which zyzxw could be in a cryptogram you "abstract" it into a pattern: abacd. Your dictionary contains every word indexed by its abstraction, and you can easily find that "every" matches "zyzxw"). In that case linear searches through the lists started at each abstraction is reasonably quick even when you're correlating to find out that "uvz" with "zyzxw" could, indeed, be "the" ... for example). I've also written a simple "Jotto" game which doesn't benefit from indexing at all --- linear scanning through the few thousand 5 or 6 letter words at each elimination step used to take far less than a second on my old 6 Mhz XT in the pre-history of modern PC computing).

Jim Dennis
A: 

There appear to be a few components to this problem.

Firstly you need to identify a collection of patterns that your word needs to match. In your example (with APP appearing on the board), your possible patterns would be APP, A and P. If there are other words on the board then things get considerably more difficult - you might have to touch two or three existing words, you have limitations on how many letters you can fit before or after an existing word, etc. But you need to define the pattern requirement first.

You then need a searching algorithm for your dictionary - you give it your patterns and your available letters, and it returns a list of all possible matches. This could be done with regular expressions across a huge list of words but it's likely to be far too slow - Zwergner's answer offers a good option for investigation. It's probably reasonable to get all the words that match your available set of letters and then filter those that match your patterns.

Finally, I expect that you need to calculate the scores for all possible words - or even return the word with top score only. Seemingly simple but if you are really working a scrabble game (are you?) you have the extra challenge of the double/triple letter/word squares; these will need to be created in a 1-1 relationship with your possible word patterns.

If you begin fairly simply like this you should be able to get a working algorithm - once you have that you can optimise your dictionary to balance letter matching vs pattern matching, and include score calculation if required.

Kirk Broadhurst
+1  A: 

You might be interested in Googling the research paper "The World's Fastest Scrabble Program" by Appel and Jacobson (1988). The algorithms are outlined in pseudocode, so it takes a bit of work to mold that into useable form and glue it all together; however, the program the authors outline works great.

allclaws
Indeed seems very good. I particularly appreciate the use of a finite automaton instead of the classical trie.
Matthieu M.