views:

56

answers:

2

I know there have been several posts about random word generation based on large dictionaries or web lookups. However, I'm looking for a word generator which I can use to create strong password without symbols. What I'm looking for is a reliable mechanism to generate a random, non recognised, English word of a given length.

An example of the type of word would be "ratanta" etc.

Are there any algorithms that understand compatible syllables and therefore generate a pronouncable output string? I know that certain captcha style controls generate these types of words but I'm unsure whether they use an algorithm or whether they are sourced from a large set as well.

If there are any .Net implementations of this type of functionality I would be very interested to know.

Thanks.

+2  A: 

I'd use a Markov chain algorithm for this.

In summary:

  1. Build a dictionary. Iterate through the letters in an example piece of English text. Build a data structure that maps pairs of letters. Against each pair, record a probability that the second letter appears immediately after the first.
  2. Generate your text. Using the map that you built in (1), pick a sequence of random letters. When deciding what letter to write next, look at the letter you wrote most recently, and use that letter to determine the probability of the next letter.
Tim Robinson
Good link. Readers could probably skip ahead to the Markov Text Generators section, well down the page.
kbrimington
+1  A: 

There are several things you can do:

1) Research English syllable structure, and generate syllables following those rules

2) Employ Markov chains to get a statistical model of English phonology.

There are plenty of resources on Markov chains, but the main idea is to record the probability of there being any particular letter after a certain sequence. For instance, after "q", "u" is very very likely; after "k", "q" is very very unlikely (this employs 1-length Markov chains); or, after "th", "e" is very likely (this employs 2-length Markov chains).

If you go the syllable model route, you can use resources like this to help you elucidate your intuitions about your language.

UPDATE:

3) You can make it much simpler by not simulating full English, but, say, Japanese, or Italian, where rules are much easier, and if it's a nonsense word it is as easy to remember as a nonsense English word. For instance, Japanese only has about 94 valid syllables (47 short, 47 long), and you can list all of them easily and pick at random.

Amadan
@Amadan, Markov Chains appear to achieve what I am looking for. I'm going to wait until a few more suggestions appear in case there is a better solution but will mark this answer otherwise in a few days. In the meantime I've came across a good example c# class implementation of this in case anyone is interested. http://www.siliconcommandergames.com/MarkovNameGenerator.htm
Brian Scott