views:

3059

answers:

8

I'm working with a client that needs to generate millions of the alphanumeric codes used in magazine scratch-off cards, bottlecap prizes, and so on. They have to be short enough to print on a cap, they want to make sure that ambiguous characters like 1 and I, 0 and O, etc. are not included, and they have to be explicitly stored for future use -- we can't just have an algorithm that determines 'validity' when someone tries to redeem one. Finally, they want to make sure that the codes are randomly distributed inside of a large "code space" so that people can't just guess additional codes by walking through the alphabet.

Are there any pointers towards reasonably efficient algorithms for generating these kinds of code sets? I've scratched a few out on the back of an envelope, but this problem smells like a trap for the unwary.

A: 

Try this.

Gaurav
+3  A: 

Let's suppose you can use a character set of, say, 40 symbols of unambiguous upper,lower and numeric characters.

For a sequence of n chars, you've got 40^n combinations

  • 40^4 = 2,560,000
  • 40^5 = 102,400,000
  • 40^6 = 4,096,000,000
  • 40^7 = 163,840,000,000
  • 40^8 = 6,553,600,000,000

Thus 8 chars gives a pretty good space to work in - if you generated 10 million codes, you'd have to try hundreds of thousands of combinations to brute force a code.

Or you come at from the other direction - give the number of possible codes, how many codes should you generate to avoid the trap they call the Birthday Paradox?

Taking the 8 char code, 6,553,600,000,000 is approx 2^42, thus you might reasonably generate 2^21 codes from it, or 2,097,152

Paul Dixon
+1  A: 

Checkout this question, almost the same: http://stackoverflow.com/questions/55218/unique-key-generation

Darryl Hein
+6  A: 

If you need about 10 million unique keys (for example), the best approach is to pick a key-space that's exponentially bigger, and start randomly generating. Read about the Birthday Paradox -- it's the main thing you should be worried about. If you want 2^n unique and secure keys, make sure there are at least 2^(2 * n) possible values. Here's a rough O(n log n) algorithm:

  • Use a key space of at least 2^50 (so, in other words, allow 2^50 possible unique values), and you'll have barely any collisions in your entire dataset -- and anyone brute forcing your keys will have about even odds of getting a key if they try 2^25 of them.
  • generate as many random numbers as you need
  • index the database on your key (this is the O(n lg n) step: the sort)
  • page through the DB and iterate over the entire data set to trim duplicates (pseudocode below)
  • Delete the duplicate rows, and you're done.

Pseudocode:

$last = null;
while ($current = getnext()) {
    if ($last == $current) {
        push($toDelete, $current);
    }
    $last = $current;
}
ojrac
instead of sort/delete duplicates, simply use a unique key.
Javier
My goal was to avoid that specifically, so you don't have to do a uniqueness check on every single row at INSERT time. Instead, you can insert the full list of data and sort it once, in O(n lg n). Unless the DB validates your unique key in less than O(lg n), an un-indexed DB is what you want.
ojrac
A: 

Use a one time password algorithm?

RFC4225 details one based on HMAC algorithm.

http://www.ietf.org/rfc/rfc4226.txt

but instead of using 0-9 digits base10 encoding, use base32.

A: 

Whatver method you use, I would suggest you add a check digit or two as a "first-line" defence against people mis-entering or trying to invent a number.

staticsan
A: 

Oddly enough, with the following seed I was only able to generate 32 unique strings.

ABCDEFGHJKLMNPQRSTUVWXYZ23456789

With a longer seed I was able to generate many more--generated 40,000 unique strings successfully.

ABCDEFGHJKLMNPQRSTUVWXYZ234567892345678923456789ABCDEFGHJKLMNPQRSTUVWXYZ234567892345678923456789ABCDEFGHJKLMNPQRSTUVWXYZ234567892345678923456789

kalinma
A: 

We have more upgraded code

http://www.msoft-technologies.com

MSOFT TECHNOLOGIES