ansaurus

Question

Algorithm to choose random letters for word search game that allows many words to be spelled

Answer 1

A:

I do not know about a precanned algorithm for this, but...

There is a dictionary file in UNIX, and I imagine there is something similar available on other platforms (maybe even in the java libraries? - google it). Anyways, use the files the spell checker uses.

After they spell a word an it drops out, you have existing letters and blank spaces.

1) From each existing letter, go right, left, up, down (you will need to understand recursive algorithms). As long as the string you have built so far is found at the start of words or backwards from the end of words in the dictionary file, continue. When you come across a blank space, count the frequency of the letters you need next. Use the most frequent letters.

It will not guarantee a word as you have not checked the corresponding ending or beginning, but I think it would be much easier to implement than an exhaustive search and get pretty good results.

Jeff 2010-02-15 18:48:01

Could you give a short example? I'm not sure how this would work.

BobbyJim 2010-02-15 19:31:17

Answer 2

A:

I think this will get you a step closer to your destination: http://en.wikipedia.org/wiki/Levenshtein_distance

Trevoke 2010-02-15 18:48:03

Answer 3

+6 A:

Here's a simple method:

Write a fast solver for the game using the same word list that the player will use. Generate say 100 different possible boards at random (using letter frequencies is probably a good idea here, but not essential). For each board calculate all the words that can be generated and score the board based on the number of words found or the count weighted by word length (i.e. the total sum of word lengths of all words found). Then just pick the best scoring board from the 100 possibilities and give that to the player.

Also instead of always picking the highest scoring board (i.e. the easiest board) you could have different score thresholds to make the game more difficult for experts.

Mark Byers 2010-02-15 18:48:06

Thanks. This is probably the most bullet-proof idea in that you could e.g. guarantee (most of the time) that there will always be a certain number of large words to pick. My board will be 6x6 and using a trie takes too much memory so I'm not sure how I could use this efficiently though.

BobbyJim 2010-02-15 19:38:08

Using a word prefix list (trie) is going to give the best performance if you have the memory. If you store the trie compressed you can probably create fit a full trie in a few MBs I'd guess. If not, you can still probably get a word prefix list of up to length 5 in memory, then switch to binary (or interpolated) search of the full word list to check for matches longer than 5. Alternatively... count the prefixes up to length 5 and assume that lots of small partial words gives a good chance of a long word without explicitly checking for the long words.

Mark Byers 2010-02-15 19:55:02

If you're daring you could use a DAWG that is stored in an array. There is an excellent video lecture from Stanford on that found here: http://www.youtube.com/watch?v=TJ8SkcUSdbUThe short story is she managed to store 250,000 words in .32 MB

Niki Yoshiuchi 2010-02-15 20:32:27

If you can push this solving work to a separate thread, the solver can work ahead while the user is playing the game. This will give your solver more time to work.

Mr. Berna 2010-02-16 14:29:59

Answer 4

+1 A:

A minor variation on the letter-pair approach: use the frequency of letter pairs in long words - say 6 letters or longer - since that's your objective. You could also develop a weighting that included all adjacent letters, not just a random one.

Carl Manaster 2010-02-15 18:49:06

Nice about about using the 6 letter long words! I considered using trigrams (only consider the frequency of 3 letter pairs) but your idea sounds closer to what I really want.

BobbyJim 2010-02-15 19:32:06

Answer 5

A:

This wordgame I slapped up a while back, which behaves very similarly to what you describe, uses English frequency tables to select letters, but decides first whether to generate a vowel or consonant, allowing me to ensure a given rate of vowels on the board. This seems to work reasonably well.

moonshadow 2010-02-15 18:50:11

Thanks. What did you use for the vowel/consonant rate? My feelings is, in every local 2x2 grid, you should probably have at least one vowel. Otherwise, you might get 'trapped' groups of consonants in the corners that you cannot use in words. Did you use just use regular letter frequency tables and not e.g. paired letter frequencies?

BobbyJim 2010-02-15 19:30:33

@Bobby: because the board mutates after each word, the player can "chip away" at clumps of difficult letters over time - one might think of that as part of the game strategy. The vowel/consonant rate is hardwired to 0.559 - I obtained that value and the letter frequencies by gathering stats on some ebooks I had lying around :)

moonshadow 2010-02-15 19:42:31

OK, thanks. I've actually tested my game with the falling behavior but I've found players tend to ignore the bottom letters when the letters there aren't very good and they spend all their time at the top. I was thinking about letters falling in from all directions somehow. Or make it a requirement to dispose of old letters. Also, falling letters make it hard to e.g. fix the number of vowels in local grid positions. I might be over thinking this though. :) I would just quite like it if e.g. every grid had at least one long word in it so experts could show off.

BobbyJim 2010-02-15 19:48:07

Answer 6

+2 A:

You should look up n-gramming, and Markovian Models.

Your first idea is very losely related to Markovian algorithms. Basically, if you have a large text corpus, say of 1000 words. What you can do is analyse each letter and create a table to know the probability of a certain letter following the current letter.

For example, I know that the letter Q from my 1000 words ( 4000 letters in total ) is used only 40 times. Then I calculate what probable letters follow using my markov hash table.

For example, QU happens 100% of the time so I know that should Q be randomly chosen by your application that I need to make sure that the letter U is also included. Then, the letter "I" is used 50% of the time, and "A" 25% of the times and "O" 25% of the time.

Its actually really complicated to explain and I bet there are other explainations out there which are much better then this.

But the idea is that given a legitmately large text corpus you can create a chain of X letters which are probably consistent with English language and thus should be easy for users to make words out of. You can choose to look forward on a value of n-gram, the highest the number the easier you could make your game. For example, an n-gram of two would probably make it very hard to create words over 6, but an n-gram of 4 would be very easy.

The Wikipedia explains it really badly, so I wouldn't follow that.

Take a look at this Markov generator:

http://www.haykranen.nl/projects/markov/demo/

Laykes 2010-02-15 18:55:56

Thanks, sounds interesting. Could you elaborate a bit more about the n-gram of 4 idea? Would I e.g. pick an adjacent chain of 4 letters, say "C-H-A-N", near my random letter location, then ask a table to pick a letter that usually follows the 3 letters "CHAN" e.g. "G" as in "CHANGING"?

BobbyJim 2010-02-15 19:36:40

I've always been scared of Markov chains. The main wiki article is confusing but this one is quite good:http://en.wikipedia.org/wiki/Examples_of_Markov_chains

BobbyJim 2010-02-15 19:44:19

n-gramming is just where you break down something in N number of grams. For example, on a 1-gram the word Boggle is1 gramB O G G L E2-gram (Commonly called a bigram)It would beB BO OG GG GL LE E3-gram (Commonly called a trigram)It would beB BO BOG OGG GGl GLE LE EOn a 4-gram (Just called an n-gram)It would beB BO BOG BOGG OGGL GGLE GLE LE EYou can see how if you use a markov chain with a particular n-gram you can group particular common occuring charachter sequences. Incidentally, as you increase the n-gram you will find the game becomes easier.

Laykes 2010-02-15 20:34:14

Answer 7

A:

You might look at this Java implementation of the Jumble algorithm to find sets of letters that permute to multiple dictionary words:

$ java -jar dist/jumble.jar | sort -nr | head
11 Orang Ronga angor argon goran grano groan nagor orang organ rogan 
10 Elaps Lepas Pales lapse salep saple sepal slape spale speal 
9 ester estre reest reset steer stere stree terse tsere 
9 caret carte cater crate creat creta react recta trace 
9 Easter Eastre asteer easter reseat saeter seater staree teaser 
9 Canari Carian Crania acinar arnica canari carina crania narica 
8 leapt palet patel pelta petal plate pleat tepal 
8 laster lastre rastle relast resalt salter slater stelar 
8 Trias arist astir sitar stair stria tarsi tisar 
8 Trema armet mater metra ramet tamer terma trame 
...

trashgod 2010-02-16 00:52:59

ansaurus

tags:

views:

answers:

Algorithm to choose random letters for word search game that allows many words to be spelled

related questions