views:

658

answers:

8

Generating a truly random string of a given length is a fairly straightforward (and already-well-covered) task.

However; I'd like to generate a "pseudo" random string with the additional constraint that it be relatively easily readable (to a native-English reader.)

I think another way to say this is to say that the generated string should consist of "recognizable syllables." For example, "akdjfwv" is a random string, but it's not recognizable at all. "flamyom"; however, is very "recognizable" (even though it's nonsense.)

Obviously, one could make a long list of "recognizable syllables," and then randomly select them.

But, is there a better way to do something like programmatically generate a "recognizable syllable," or generate a "syllable" and then test it to see if it's "recognizable"?

I can think of several ways to go about this implementation, but if someone has already implemented it (preferrably in Java or C#,) I'd rather re-use their work.

Any ideas?

+14  A: 

You could try implementing a Markov chain and give it a suitable passage to process. There is a Java implementation that may work for you.

This is a sample from interpolating between Genesis in English and Genesis in Spanish (N = 1):

In bersaran thelely and avin inder tht teathe m lovig weay waw thod mofin he t thte h fupiteg s o t llissed od ma. lllar t land fingujod maid af de wand tetodamoiz fosu Andesp. ersunen thenas lowhejod whipanirede tifinas Gofuavithila d gió Y Diche fua Dios co l, liens ly Y crerdíquen ticuesereregos hielase agúnd veumarbas iarasens laragún co eruerá laciéluelamagúneren Dien a He.

Rich Seller
I'd +5 for mention of Markov chains, but I can only +1... ;)
Alex Feinman
@Alex, thanks. you can go vote for another answer of mine that you like (if there are any) if you're feeling that generous
Rich Seller
That would be voilating the intent of the point system
Steve Kuo
While this is extremely interesting stuff (and brilliant), the answer on the Java Passsword Generator is much closer to my needs. In my question, I probably should have mentioned that I really need it to be in the 6-12 character, no spaces, length range.
Jared
But, no, I'm not actually generating passwords...kindof similar, tho...I need to generate strings that will be used as log tokens in automated testing (e.g. - generate "names" that will be inserted in a database, and used repeatedly in later test cases.)
Jared
@Steve Kuo, I don't see how it is "violating the intent of the points system". I said that he could go vote for another answer of mine that he likes (if there are any). Aren't you supposed to vote for answers you approve of? The suggestions was simply to go and have a look.
Rich Seller
+5  A: 

You need to generate random syllables. The simplest way to do it is to use syllables that are consonant-vowel, or consonant-vowel-consonant. From a list of consonants and vowels, pick randomly to build syllables, then join the syllables together to make a string.

Keep in mind your list of consonants shouldn't be letters that are consonants, but phonemes, so "th", "st", "sl", etc, could be entries in the consonant list.

Ned Batchelder
+5  A: 

I think this should do what you want:

Java Password Generator

It has the source code and a permissive license so you can adapt the source code to what you are looking for.

Yishai
+1  A: 

You really should check out SCIgen. It generates entire semi-nonsense scientific papers: http://pdos.csail.mit.edu/scigen/

And the source is available: it's released under GPL, and is currently available via anonymous CVS.

CPerkins
A: 

I'm not sure exactly what you need this for, but if graphic-layot folks in the print industry have used Lorem Ipsum generators to create text that looks enough like text that your brain processes it as such without actually being readable words. More info here

I don't know if there's a web service to which you could subscribe, but there are several sites which will just generate Lorem Ipsum strings for you, so you may be able to use those.

AllenG
+1  A: 

There is a good section on this in Programming Pearls. It's online but I'd highly recommend buying the book; One of the best programming books around in my opinion.

Adamski
A: 

Lots of Lorem Ipsum generators out there.

Chris Judge
A: 

All gets back to why you want this. If you just want "pronounceable gibberish", I'd think the easiest thing to do would be to generate alternating consonants and vowels. That would be a tiny subset of all pronounceable gibberish, but what's the goal? To give a little broader range you could create a table of consonant phonemes and vowel phonemes, with the consonant list including not just individual letters like "b" and "d" but also "th", "br", and so on, and the vowel list could include "oo" and "ea", etc. One more step would be to generate syllables instead of letters, with a syllable containing either vowel, consonant-vowel, or consonant-vowel-consonant. That is, loop through creating syllables, then within syllables pick one of the three patterns. You probably want to forbid two vowel-only syllables in a row. (I'm trying to think of an example of that in English. It probably happens, but the only examples I can think of are borrowed from other languages, like "stoa".)

Jay