views:

334

answers:

5

I am researching methods to generate a random human friendly code but not (easily) guessable. This will be used to give away prizes (think unique discount codes). We are to generate about 50k. Are there any standard methods/algorithms to accomplish this? I was thinking of using a GUID and applying CRC. Is this a bad idea?

Using .netframework 3.5 if it matters.

+1  A: 

I've used Base32 before for this sort of thing.

You just need an alphabet that avoids troublesome characters like I, L, 1, 0, O, etc.

Edit: So, to clarify, just Base32 encode enough randomly generated bytes to give you the required number of characters.

Then you probably want split the output into groups of 4 for extra human readablity.

Edit 2: Another good idea would be to make the last letter a check digit - say sum of all previous bytes modulo 32.

Blorgbeard
thx, I will give Base32 a shot.
B Z
+4  A: 

I've generated human-friendly checksums by taking bits from an MD5 checksum and using them into index into a list of words. For example:

: nr@yorkie 7012 ; md5words /home/nr/.profile  
overextend moonscape cucumbers outsmarting

The code is about 40 lines of Lua not counting the word list, which is included in the script so as to produce identical results on every system.


EDIT:

In your application, you want to generate 50,000 keys. You can do it by something like this:

for ((i=1; i<=50000; i++))
do 
  echo "this is my secret phrase $i" | md5words
done

Using this procedure with a different secret phrase produces these keys:

Chisinau Phaethon customs Martina
commensurate freewill logical cambered
kamikazes Creighton Dobro's Alonzo
medallion's jesters goofy keystones
Anaxagoras martial Medina's Hon's
acclimatized chirping Cleopatra's mascaras
buoyant nuclear lumbering disagreements
dampens Philby cloak drollness

These keys are difficult to forge: the word list has almost 100,000 words on it, so there are 10^20 possible 4-word sequences. If you have 100,000 codes, the chance of somebody being able to guess a code at random are one in 10^15. If you put a throttle on the number of keys people are allowed to try, say one key every 0.3 seconds, you won't have a problem.

If I were deploying this idea in your application, I would prune the word list to something shorter, maybe only 10,000 words that are very commonly recognized. Even after losing a factor of 10^4 the numbers are vastly in your favor---the chance of guessing a key would be 1 in 100 billion.

Norman Ramsey
We need to generate about 50k codes, if we would use a list of words, wouldn't it be fairly easy to guess codes? Maybe I am not understanding your approach. Thx.
B Z
I've extended my answer to show that when you generate a sequence of words, you can easily create sequences that are very hard to guess.
Norman Ramsey
P.S. I will offer a bounty of 100 reputation to anyone who can guess the next key in the sequence.
Norman Ramsey
Thx Ramsey, will check it out
B Z
+1  A: 

A couple of ideas that leap to mind:

  • If length permits, you could always generate phrases by picking words from a dictionary. For a good example of this in practice, see Diceware.

  • If length is more critical, use a list of syllables and the result would be a nonsense but pronounceable word. You might need to filter the results to remove undesirable actual words, however.

  • If pronounceable is not required, but you want to be able to quickly verify a code as coming from your system, then a small CRC of a (salted?) counter packed into a byte array and encoded with base64 or similar would work. You can improve the human factors by keeping the code short, and by picking an encoding that eliminates similar letters (i.e. don't have 'O', 'o', '0', and 'Q' all in the table).

  • If the code needs to be longer than about 5 characters in practice, then consider adopting a standard punctuation scheme that breaks it into chunks. "A236re8ww1jkm" is much harder than "A236-re8wM-1jkz" to read and transcribe. The number five here is a wild guess... there is probably literature on the best length.

  • If this is really a crypto question, (i.e. if there is significant real value to be had from forging these codes) then consult a crypto expert for a best practices answer because rolling your own will cause grief eventually.

RBerteig
+2  A: 

My favorite method of creating human friendly, pronouncable, but ultimately meaningless and random words is Markov chains. Here are some links to help you along this path.

http://www.codinghorror.com/blog/archives/001132.html - Jeff Atwood's explanation (very good!)

http://en.wikipedia.org/wiki/Markov_chain - Wikipedia's entry on Markov chains is lengthy and goes over the math.

http://www.cs.bell-labs.com/cm/cs/pearls/sec153.html - From Programming Pearls, has Perl for a Markov chain generator.

http://www.xradiograph.com/WordSalad/ChainsOfLove - Has links to Markov chain generators.

http://www.jwz.org/dadadodo/ - Jamie Zawinski's implementation in C.

And remember that you have to feed these generators text to make the chains, then it'll generate words and sentences out of them.

Colin Curtin
I am not sure how this would work for producing small codes that can be given away.
B Z
See http://www.fourteenminutes.com/fun/words/index.cgi for a word generator like I'm talking about. You could use these as your small codes, which are more memorable than strings of random characters (if that's what you're looking for).
Colin Curtin
ok thx, will check it out
B Z
whoah -- I guess collating Markov links _is_ useful to somebody else! (I run xradiograph.com, above)
Michael Paulukonis
+2  A: 

Java Pronouncable Passwords is a site that generates random pronouncable words. The source is available,so you could port it to whatever system you need.

This should provide you with a code you can give to a user, one that they have a chance of actually remembering.

RodeoClown
thx, will check it out.
B Z