views:

663

answers:

11

When a user adds a new item in my system, I want to produce a unique non-incrementing pseudo-random 7-digit code for that item. The number of items created will only number in the thousands (<10,000).

Because it needs to be unique and no two items will have the same information, I could use a hash, but it needs to be a code they can share with other people - hence the 7 digits.

My original thought was just to loop the generation of a random number, check that it wasn't already used, and if it was, rinse and repeat. I think this is a reasonable if distasteful solution given the low likelihood of collisions.

Responses to this question suggest generating a list of all unused numbers and shuffling them. I could probably keep a list like this in a database, but we're talking 10,000,000 entries for something relatively infrequent.

Does anyone have a better way?

+2  A: 

i would suggest using a guid instead of a 7 digit code as it will be more unique and you don't have to worry about generateing them as .NET will do this for you.

harryovers
`System.Guid.NewGuid();`
harryovers
He said people have to share this number with other people. GUIDs are not ... shall we say ... *ideal* ... for that purpose.
T.J. Crowder
Yes, exactly. For a bit more background, this is essentially a "group" that people can join. It's difficult to tell people to join group a53df3d0-171f-11df-8a39-0800200c9a66.
Damovisa
@Damovisa: And you wouldn't want to join that group even if someone invited you to, would you? ;-) *ducks and runs*
T.J. Crowder
+4  A: 

Honestly, if you want to generate only a couple of thousand 7-digit codes, while 10 million different codes will be available, I think just generating a random one and checking for a collision is good enough.

The chance of a collision on the first hit will be, in the worst case scenario, about 1 in a thousand, and the computational effort to just generate a new 7-digit code and check for a collision again will be much smaller than keeping a dictionary, or similar solutions.

Using a GUID instead of a 7-digit code as harryovers suggested will also certainly work, but of course a GUID will be slightly harder to remember for your users.

Aistina
Whilst in reality it is unlikely to happen, never forget that the actual worst case scenario for generating in this way is infinite.
Robin Day
@Robin, luckily the chance for that is also infinitely small :)
Aistina
+2  A: 

All solutions for a "unique" ID must have a database somewhere: Either one which contains the used IDs or one with the free IDs. As you noticed, the database with free IDs will be pretty big so most often, people use a "used IDs" database and check for collisions.

That said, some databases offer a "random ID" generator/sequence which already returns IDs in a range in random order.

This works by using a random number generator which can create all numbers in a range without repeating itself plus the feature that you can save it's state somewhere. So what you do is run the generator once, use the ID and save the new state. For the next run, you load the state and reset the generator to the last state to get the next random ID.

Aaron Digulla
Thanks, you're right of course - I'll have to store that "unique" Id somewhere. Do you have any more information on database random sequence generators? I'm currently using SQL Server Express 2008.
Damovisa
@Damovisa: See SideShowCoder's answer about LFSRs.
Aaron Digulla
+5  A: 

You could use an incrementing ID and then XOR it on some fixed key.

const int XORCode = 12345;

private int Encode(int id)
{
    return id^XORCode;
}

private int Decode(int code)
{
    return code^XORCode;
}
Robin Day
I'll have to investigate this further, but that sounds like it could work... You're still likely to get clusters of codes though aren't you?
Damovisa
It depends on your ids and your choice of key. As you only expect ~10000 items then you should be able to play with the data and see what outcome you get.
Robin Day
A: 

With only thousands of items in the database, your original idea seems sound. Checking the existance of a value in a sorted (indexed) list of a few tens of thousands of items would only require a few data fetches and comparisons.

Pre-generating the list doesn't sound like a good idea, because you will either store way more numbers than are necessary, or you will have to deal with running out of them.

Jeffrey L Whitledge
Well, he has only seven digits -- he'll run out of them in the same time whether he stores them in a database or not. :-)
T.J. Crowder
Based on the question, it is unlikely that he will use up all ten million possible numbers. Therefore, it's wasteful to store all ten million. So how many does he need to store? A hundred thousand is also wasteful. Ten thousand is probably not enough. Finding the proper balance is a trade-off, and it doesn't sound like a good plan to me. I prefer solutions that are guaranteed to operate well under all potential scenerios.
Jeffrey L Whitledge
+2  A: 

I assume you'll have a table of the generated ones. In that case, I don't see a problem with picking random numbers and checking them against the database, but I wouldn't do it individually. Generating them is cheap, doing the DB query is expensive relative to that. I'd generate 100 or 1,000 at a time and then ask the DB which of those exists. Bet you won't have to do it twice most of the time.

T.J. Crowder
Not a bad idea...
Damovisa
+11  A: 

Pick a 7-digit prime number A, and a big prime number B, and

int nth_unique_7_digit_code(int n) {
    return (n * B) % A;
}

The count of all unique codes generated by this will be A.

If you want to be more "secure", do pow(some_prime_number, n) % A, i.e.

static int current_code = B;
int get_next_unique_code() {
   current_code = (B * current_code) % A;
   return current_code;
}
KennyTM
Do you have any background info on why this works?
spoulson
I'm not entirely sure what this is doing, but it looks vaguely similar to the important part of the RSA algorithm. I think.
rmeador
There is no point in picking a B that is larger than A.
GregS
It is elementary number theory, and it works because A is prime and the GCD(A,B) = 1. It guarantees no repeats, and the result "looks" random.
GregS
@GregS: Picking B >> A makes the pattern not so clear. For the `pow` method it doesn't matter.
KennyTM
Provided performance on this isn't terrible (I can't see it being too bad), I think I'll end up using this one. Thanks!
Damovisa
@KennyTM: your B and (B % A) are equivalent, so you might as well pick the smaller (B % A).
GregS
Note that given a few consecutive numbers you can easily calculate A and B, so this is in no way cryptographically secure. But if the goal is just to *look* random, then it should work as advertised.
Rasmus Faber
Thanks! I needed something like that. I used [node id] + [timestamp] + [the unique code]. All this nicely base58 encoded into 10 chars for my DB key. For the number of keys a node can generate I'm covered. Spanks alot!
Derick
A: 

Probability of having hits is very low.
For instance - you have 10^4 users and 10^7 possible IDs.
Probability that you pick used ID 10 times in row is now 10^-30.
This chance is lower than once in a lifetime of any person.

ralu
Ahem, birthday paradox.
KennyTM
It does not apply here. He should add the code for checking hits but there is low probability that he will have to try more than few times.
ralu
+2  A: 

You have <10.000 items, so you need only 4 digits to store a unique number for all items. Since you have 7 digits, you have 3 digits extra.

If you combine a unique sequence number of 4 digits with a random number of 3 digits, you will be unique and random. You increment the sequence number with every new ID you generate.

You can just append them in any order, or mix them.

seq = abcd, rnd = ABC

You can create the following ID's:

  • abcdABC
  • ABCabcd
  • aAbBcCd

If you use only one mixing algorithm, you will have unique numbers, that look random.

GvS
+1  A: 

I would try to use an LFSR (Linear feedback shift register) the code is really simple you can find examples everywhere ie Wikipedia and even though it's not cryptographically secure it looks very random. Also the implementation will be very fast since it's using mainly shift operations.

SideShowCoder
A: 

Well, you could ask the user to pick their own 7-digit number and validate it against the population of existing numbers (which you would have stored as they were used up), but I suspect you would be filtering a lot of 1234567, 7654321, 9999999, 7777777 type responses and might need a few RegExs to achieve the filtering, plus you'd have to warn the user against such sequences in order not to have a bad, repetitive, user input experience.

Gordon Mackie JoanMiro