views:

59

answers:

1

I'm creating a word search and am trying to calculate quality of the generated puzzles by verifying the word set is "distributed evenly" throughout the grid. For example placing each word consecutively, filling them up row-wise is not particularly interesting because there will be clusters and the user will quickly notice a pattern.

How can I measure how 'evenly distributed' the words are?

What I'd like to do is write a program that takes in a word search as input and output a score that evaluates the 'quality' of the puzzle. I'm wondering if anyone has seen a similar problem and could refer me to some resources. Perhaps there is some concept in statistics that might help? Thanks.

+3  A: 

The basic problem is distribution of lines in a square or rectangle. You can eighter do this geometrically or using integer arrays. I will try the integer arrays here.

Let M be a matrix of your puzzle,

A B C D
E F G H
I J K L
M N O P

Let the word "EFGH" be an existent word, as well as "CGKO". Then, create a matrix which will contain the count of membership in eighter words in each cell:

0 0 1 0
1 1 2 1
0 0 1 0
0 0 1 0

Apply a rule: the current cell value is equal to the sum of all neighbours (4-way) and multiply with the cell's original value, if the original value is 2 or higher.

0 0 1 0      1 2 2 2
1 1 2 1  -\  1 3 8 2
0 0 1 0  -/  1 2 3 2
0 0 1 0      0 1 1 1

And sum up all values in rows and columns the matrix:

1 2 2 2 =  7
1 3 8 2 = 14
1 2 3 2 =  8
0 1 1 1 =  3
| | | |
3 7 | 6
    14

Then calculate the avarage of both result sets:

(7 + 14 + 8 + 3) / 4 = 32 / 4 = 8
(3 + 7 + 14 + 6) / 4 = 30 / 4 = 7.5

And calculate the avarage difference to the avarage of each result set:

3  <-> 7.5 = 4.5       7  <-> 8 = 1
7  <-> 7.5 = 0.5       14 <-> 8 = 6
14 <-> 7.5 = 6.5       8  <-> 8 = 0
6  <-> 7.5 = 1.5       3  <-> 8 = 5
             ___avg               ___avg
             3.25                 3

And multiply them together:

3 * 3.25 = 9.75

Which you treat as a distributionscore. You might need to tweak it a little bit to make it work better, but this should calculate distributionscores quite nicely.

Here is an example of a bad distribution:

1 0 0 0      1 1 0 0      2
1 0 0 0  -\  2 1 0 0  -\  3         -\  C avg 2.5  -\  C avg-2-avg 0.5
1 0 0 0  -/  2 1 0 0  -/  3         -/  R avg 2.5  -/  R avg-2-avg 2.5
1 0 0 0      1 1 0 0      2                                       _____*
                           6 4 0 0                                 1.25 < score

Edit: calc. errors fixed.

Pindatjuh
Wow, this is an amazing algorithm. Thank you for taking the time to write up the response, I really do appreciate it! By the way, does this algorithm have a name or is there a reference (like an algorithms textbook) I can follow up on if I were interested in the nitty gritty details? Also I think I found an error in the summation. Here is the correction.0 0 1 0 1 2 2 21 1 2 1 -\ 1 3 8 20 0 1 0 -/ 1 2 3 20 0 1 0 0 1 1 1Once again, thank you for sharing!
qwer
The formatting mangled the correction but the first row after the step"Apply a rule: the current cell value is equal to the sum of all neighbours (4-way) and multiply with the cell's original value, if the original value is 2 or higher."Should be 1 2 2 2
qwer
Also this has nothing to do with your response but I was not notified of your answer even though I explicitly marked the checkbox to register for changes to this thread. The feature appears to be broken.
qwer
Pindatjuh