Algorithm to generate a crossword

views:

2466

answers:

+12 Q:

Algorithm to generate a crossword

Given a list of words, how would you go about arranging them into a crossword grid?

It wouldn't have to be like a "proper" crossword puzzle which is symmetrical or anything like that: basically just output a starting position and direction for each word.

+2 A:

I would get an index of each letter used by each word to know possible crosses. Then I would choose the largest word and use it as base. Select the next large one and cross it. Rinse and repeat. It's probably an NP problem.

Another idea is creating a genetic algorithm where the strength metric is how many words you can put in the grid.

The hard part I find is when to know a certain list cannot possibly be crossed.

eipipuz 2009-06-03 05:15:49

I was also thinking of a genetic algorithm. The fitness function could be how tightly the words are packed into the grid.

Adrian McCarthy 2009-06-03 21:26:29

+4 A:

I actually wrote a crossword generation program about ten years ago (it was cryptic but the same rules would apply for normal crosswords).

It had a list of words (and associated clues) stored in a file sorted by descending usage to date (so that lesser-used words were at the top of the file). A template, basically a bit-mask representing the black and free squares, was chosen randomly from a pool that was provided by the client.

Then, for each non-complete word in the puzzle (basically find the first blank square and see if the one to the right (across-word) or the one underneath (down-word) is also blank), a search was done of the file looking for the first word that fitted, taking into account the letters already in that word. If there was no word that could fit, you just marked the whole word as incomplete and moved on.

At the end would be some uncompleted words which the compiler would have to fill in (and add the word and a clue to the file if desired). If they couldn't come up with any ideas, they could edit the crossword manually to change constraints or just ask for a total re-generation.

Once the word/clue file got up to a certain size (and it was adding 50-100 clues a day for this client), there was rarely a case of more than two or three manual fix ups that had to be done for each crossword.

paxdiablo 2009-06-03 05:22:59

This doesn't actually help me in my situation, since I'll have a list of only about 6-12 words. Mine is more like an learning exercise for the user than a word puzzle. +1 for the interesting algorithm anyway though!

nickf 2009-06-03 07:08:01

Nice description. I thought about this a few times in the past, but never tried it. Now for the magic question: how well did it work? Just for sparse puzzles, or also for dense ones (like in the paper)? And how many clues do you need for dense puzzles?

dmckee 2009-06-03 15:51:45

@dmckee, it was a while ago but from memory, even the dense puzzles were pretty good. Many were completed without intervention but you'd still get maybe a fifth requiring one or two extra words added. And we're talking about thousands of words in the file. No doubt backtracking could have helped but it was easier just for the client to reject one with (e.g.) 5 unfinished words than worry about trying to come up with extra clues. Five was about the outer limit I saw for unfinished words.

paxdiablo 2009-06-03 23:39:53

+3 A:

I'd generate two numbers: Length and Scrabble score. Assume that a low Scrabble score means it's easier to join on (low scores = lots of common letters). Sort the list by length descending and Scrabble score ascending.

Next, just go down the list. If the word doesn't cross with an existing word (check against each word by their length and Scrabble score, respectively), then put it into the queue, and check the next word.

Rinse and repeat, and this should generate a crossword.

Of course, I'm pretty sure that this is O(n!) and it's not guaranteed to complete the crossword for you, but perhaps somebody can improve it.

Eric 2009-06-03 05:26:01

+4 A:

I came up with a solution which probably isn't the most efficient, but it works well enough. Basically:

Sort all the words by length, descending.
Take the first word and place it on the board.
Take the next word.
Search through all the words that are already on the board and see if there are any possible intersections (any common letters) with this word.
If there is a possible location for this word, loop through all the words that are on the board and check to see if the new word interferes.
If this word doesn't break the board, then place it there and go to step 3, otherwise, continue searching for a place (step 4).
Continue this loop until all the words are either placed or unable to be placed.

This makes a working, yet often quite poor crossword. There were a number of alterations I made to the basic recipe above to come up with a better result.

At the end of generating a crossword, give it a score based on how many of the words were placed (the more the better), how large the board is (the smaller the better), and the ratio between height and width (the closer to 1 the better). Generate a number of crosswords and then compare their scores and choose the best one.
- Instead of running an arbitrary number of iterations, I've decided to create as many crosswords as possible in an arbitrary amount of time. If you only have a small word list, then you'll get dozens of possible crosswords in 5 seconds. A larger crossword might only be chosen from 5-6 possibilities.
When placing a new word, instead of placing it immediately upon finding an acceptable location, give that word location a score based on how much it increases the size of the grid and how many intersections there are (ideally you'd want each word to be crossed by 2-3 other words). Keep track of all the positions and their scores and then choose the best one.

nickf 2009-06-20 15:02:03

I happen to be writing this program as we speak, and this is the identical algorothm I chose also. For small numbers of words (10 or less), the program has no trouble calculated all possible solutions in milliseconds. The algoritm is exponential though.The easy part is writing the basic algorithm that brute-forces through all possible combinations. The hard part is the dozen or so 'short-cuts' you need to prevent the program from trying all the dead-end solutions.

2009-07-24 00:47:11

"5. ... and check if the new word interferes"How do you account for situations where the new word is place alongside an existing word, which generates gibberish in the places where it has adjacent squares? E.g.: LEMON ERASE If "LE", "ER" and "MA" etc. are not words in your list, this is wrong. On the other hand, outright rejecting such adjacencies might throw away really good grids, like: W LEMON ERASE NEXUS T T

Caffeine Coma 2009-09-22 03:18:36

Oh well, so much for my formatting.

Caffeine Coma 2009-09-22 03:19:20

@Kaffeine, yep I know what you mean - I had to throw out these options because even though they could create really good grids, it's too hard to check *(read: I couldn't be bothered)*, and chances are it's just interference anyway.

nickf 2009-09-22 04:45:12

Why not just use a random probabilistic approach to start with. Start with a word, and then repeatedly pick a random word and try to fit it into the current state of the puzzle without breaking the constraints on the size etc.. If you fail, just start all over again.

You will be surprised how often a Monte Carlo approach like this works.

Yogi 2009-06-20 15:07:58

+1 A:

I just recently wrote my own in Python. You can find it here: http://bryanhelmig.com/python-crossword-puzzle-generator/. It doesn't create the dense NYT style crosswords, but the style of crosswords you might find in a child's puzzle book.

Unlike a few algorithms I found out there that implemented a random brute-force method of placing words like a few have suggested, I tried to implement a slightly smarter brute-force approach at word placement. Here's my process:

Create a grid of whatever size and a list of words.
Shuffle the word list, and then sort the words by longest to shortest.
Place the first and longest word at the upper left most position, 1,1 (vertical or horizontal).
Move onto next word, loop over each letter in the word and each cell in the grid looking for letter to letter matches.
When a match is found, simply add that position to a suggested coordinate list for that word.
Loop over the suggested coordinate list and "score" the word placement based on how many other words it crosses. Scores of 0 indicate either bad placement (adjacent to existing words) or that there were no word crosses.
Back to step #4 until word list is exhausted. Optional second pass.
We should now have a crossword, but the quality can be hit or miss due to some of the random placements. So, we buffer this crossword and go back to step #2. If the next crossword has more words placed on the board, it replaces the crossword in the buffer. This is time limited (find the best crossword in x seconds).

By the end, you have a decent crossword puzzle or word search puzzle, since they are about the same. It tends to run rather well, but let me know if you have any suggestions on improvement. Bigger grids run exponentially slower; bigger word lists linearly. Bigger word lists also have a much higher chance at better word placement numbers.

Bryan 2010-04-11 18:41:16

I've been thinking about this problem. My sense is that to create a truly dense crossword, you cannot hope that your limited word list is going to be enough. Therefore, you might want to take a dictionary and place it into a "trie" data structure. This will allow you to easily find words that fill the left over spaces. In a trie, it is fairly efficient to implement a traversal that, say, gives you all words of the form "c?t".

So, my general thinking is: create some sort of relatively brute force approach as some described here to create a low-density cross, and fill in the blanks with dictionary words.

If anyone else has taken this approach, please let me know.

Jake 2010-09-22 17:34:08

ansaurus

tags:

views:

answers:

Algorithm to generate a crossword

related questions