How to generate 63 million prize codes

views:

774

answers:

+13 Q:

How to generate 63 million prize codes

Hiya All,

I am trying to come up with a "smart" and "secure" way of generating about 63million unique codes to be used in a competition. The codes will be about 10 characters long.

Has anyone done anything similar or aware of any "hurdles" that might arise from this issues? How do we minimize the probability of someone being able to guess the codes?

This project will be done in PHP, but that won't really matter, it's more of the logic that's an issue here.

Any feedback would be really appreciated.

UPDATE Just to clarify it will be 10 characters of case insensitive Alpha Numeric Values. A-Z0-9

Use a secure random number generator.

mP 2009-07-02 12:00:39

Probability of guessing: 63*10^6 / 10^10 = 1 / 159

kgiannakakis 2009-07-02 12:03:26

i dunno how someone +1 on this as it's a little unhelpful if not somewhat patronising :)

Shadi Almosri 2009-07-02 12:07:15

I don't think anybody said the characters must be numbers. [0-9a-z] gives 63*10^6 / 36^10 = 17/1000000000. [0-9a-Z] gives 63*10^6 / 62^10 = 75/1000000000000.

David Johnstone 2009-07-02 12:14:50

Alpha Numeric should be fine. but i think case insensitive would be best.

Shadi Almosri 2009-07-02 12:17:55

@David Johnstone: The answer only refers to random number generator, which can only give numeric results. Alpha-numeric codes is of course a solution, but you need something more than a random number generator.

kgiannakakis 2009-07-02 12:30:47

But random number generators are really random bit string generators, and these can be displayed any way you want - base 2, base 8, base 10, base 16, base 36...

David Johnstone 2009-07-02 12:36:26

Yes, but that's not part of the answer and it shouldn't be considered obvious.

kgiannakakis 2009-07-02 12:51:29

@ShadiA secure random number generator satisfies your requirements. Use it to generate your magic numbers and keep track of uniqueness. Its a very simple problem.

mP 2009-07-03 05:04:59

Doesn't need to be secure, in fact. There's one code per product sold, with no discernable order. Therefore it's practically infeasible to obtain a sequence. Without a sequence of observations, all RNGs are secure. Even http://xkcd.com/221/

MSalters 2009-09-09 14:41:26

if they are for unique participants, you can hash each participants name (and/or) details and then cut off till the first 10 characters.

OrangeRind 2009-07-02 12:01:39

Hashes, technically, may not always be unique though.

Jonathan Sampson 2009-07-02 12:02:05

The codes will be distributed on products, and then claimed, so there is no connection with a participant to create the values.

Shadi Almosri 2009-07-02 12:04:03

@Shadi - oh. my apologies then.@Jonathan - sure, thats why I asked him to add more details into making the hash to make the input string long enough. although I don't know how high would the probability of clashing in 63m would be.But still Other answers are much better. I also didn't know UUIDs.:)

OrangeRind 2009-07-02 12:09:12

Maybe this will help you a little: Universal Unique Identifier

The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. Thus, anyone can create a UUID and use it to identify something with reasonable confidence that the identifier will never be unintentionally used by anyone for anything else.

Daff 2009-07-02 12:02:02

repair the linkyou forgot to add the last 'r':)

OrangeRind 2009-07-02 12:03:34

In UUID v4 this is just a waste of entropy — you are throwing most of data away. In other UUID versions, the data may be not so random (NIC MAC address and system time — or hashes of them — are not really good random data sources).

drdaeman 2009-07-02 12:44:36

Ups thanks... damn Copy ) Yeah I know that it is actually pretty hard for a machine to generate a big amount of random numbers. The best for that big of an amount of really random data would probably be to get some measurement data from a physical experiment (radioactive decay) cause these are supposed to be naturally random (at least more than a machine generator).

Daff 2009-07-02 13:14:27

+2 A:

See this link for generating alpha-numeric strings in PHP. It uses an alphabet of 36 characters, which should be secure enough. However uniqueness isn't guaranteed. I guess you could use a Set to implement this. Since this is an one-time operation only, the time delay for testing for duplicates shouldn't be a big deal.

kgiannakakis 2009-07-02 12:15:10

Yes you're right, as the data is being setup it's not too hard to check if a code exists when placing it in the DB, and discarding if it doesn't.

Shadi Almosri 2009-07-02 12:19:06

It's faster to generate 64 million codes, sort, and check for duplicates. Only O(N log N).

MSalters 2009-07-02 12:23:21

@MSlaters: How would you do the duplicate check?

Shadi Almosri 2009-07-02 12:46:52

I think this is the best solution so far, modifying the function to only include characters that aren't confusing to the user.

Shadi Almosri 2009-07-02 12:49:00

You will compare the next value with the previous one.

kgiannakakis 2009-07-02 12:52:20

+5 A:

Generate a set of truly random, unique 64-bit numbers over the range 0 - 2⁵⁰-1. You'll need to keep track of the ones you have seen and reject duplicates. Use each 5 bits of the lower 50 bits of this number pull from a 32-character alphabet -- basically all the letters in the English alphabet (upper or lowercase) minus L and O plus the digits 2-9 (this reduces confusion between l/1 and 0/O). For 63 million codes, this would give you a 0.000006% probability (63,000,000/2⁵⁰) of choosing a valid code sequence randomly.

I've also done this using an autogenerated, primary key (int) and bit-interleaving it with a 32-bit random value. In this case I used the full 64-bits to generate 13 characters from the alphabet and added two random characters at fixed positions for a 15-character code. When redeeming the code you reverse the algorithm to extract the key and the randomness, throwing away the two extra random characters, then compare the randomness to that found stored with the key to validate the code.

tvanfosson 2009-07-02 12:17:25

+9 A:

If I understand you correctly, you want to create 63 milion codes of 10 digits that have a low "guess factor".

There are 10,000,000,000 valid combinations. Of these 63,000,000 are price numbers. 63 / 10,000 = 0.0063. So each guess has a 0,63% chance of success. Does not sound high, but with brute force, the numbers are quite easy to obtain.

Are you sure the 63 on 10,000 ratio is good enough?

Gamecat 2009-07-02 12:20:19

This is a good point.

Mark Cooper 2009-07-02 12:24:31

A good reason not to use decimal digits; see kgiannakakis

MSalters 2009-07-02 12:25:55

He specifically said characters, not digits. A good warning, but I think he already gets that.

tvanfosson 2009-07-02 12:27:33

He said in response to another answer that alpha-numeric would most likely be used.

David Johnstone 2009-07-02 12:31:33

They will be Alpha Numeric...

Shadi Almosri 2009-07-02 12:31:41

Also in place will be a form of anti brute force protection....

Shadi Almosri 2009-07-02 12:33:23

The alpha part was not clear at first. Now the guess chance is 0.0000017%. Looks a lot better to me ;-).

Gamecat 2009-07-02 12:40:25

You say the codes are 10 'characters' long, but what is your character set?

If it's just digits, then (@Gamecat) it'll probably be a little too easy to randomly guess a code.

On the other hand, if the character set is letters + digits, then you've got plenty of safety.

In any case generate using secure random number generator, and check for duplicates before putting in a database.

Douglas Leeder 2009-07-02 12:27:58

+22 A:

Syntax:

You probably will have people copying these codes, so that means these codes should be easy to copy. 10^10 is too small, as Gamecat points out. kgiannakakis has a better idea, but that causes another problem: "1" looks a lot like "I". "0", "C", "O" and "Q" are also quite similar. This is not a big problem. Define a safe alfabet: "0123456789ABDEFGHJKLMNPRSTUVXYZ" (leaves out COIQ) From the comments: depending on the fonts you pick, 5/S and U/V may also be visually ambiguous; replace as required. This is a 32 symbol (5 bit) code. A 10 character code is a 50 bits number. Those should be fairly trivial to generate, sort, copy, compare etc. Chances of being guessed are about 0.63E-7

Since the codes are too long to remember, users will need a resting point when copying them. So split the string in two or three parts and make sure the input field matches this breakdown.

E.g. AKG3L-45TEE => two groups of 5, and even if you can't remember 5 chars it's a lot easier to find the point back where you stopped reading.

How to generate them:

This is fairly simple. You don't need a particularly sophistciated algorithm to generate candidates. You can generate 10 random numbers per code needed, take 5 bits from each number (typically the middle bits are best, e.g. (rand()/64) modulo 32). Use this value [0-31] as index into your alphabet. Create a database table with this string as a primary key, and insert until the table has 63 million entries. You probably want to add "generated on" and "redeemed on" dates to this table.

MSalters 2009-07-02 12:31:41

+1 Good point on the removing of confusing values

Shadi Almosri 2009-07-02 12:35:34

And having 32 symbols could simplify some things too

David Johnstone 2009-07-02 12:40:26

The letter ess 'S' and the number five '5' are also homomorphic.

dwhall 2009-07-02 13:02:10

The set of similar characters depends heavily on the font, in some fonts U and V can also be hard to distinguish.I'd also sacrifice a symbol worth of the codespace to error detection, so you can inform the user that he probably mistyped the code. This still leaves plenty of room to make guessing infeasible if some trivial throttling is used.

Ants Aasma 2009-07-02 13:23:41

Thanks for this, i think with slight mod using info from other answers on here, this solution is resolved. Appreciated :)) thanks you!

Shadi Almosri 2009-07-03 09:23:07

+3 A:

Be careful when using alphanumerics for codes, as you don't want to accidentally generate anything confusing or embarrassing. To avoid confusion I suggest removing 1 and L, 0 and O, and maybe 8 and B. To avoid embarrassment consider removing all vowels so that you cannot accidentally spell anything (use your imagination here).

Alan 2009-07-02 12:36:15

ansaurus

tags:

views:

answers:

How to generate 63 million prize codes

related questions