tags:

views:

780

answers:

18

Hi,

i want to generate a sequence of unique random numbers in the range of 00000001 to 99999999.

So the first one might be 00001010, the second 40002928 etc.

The easy way is to generate a random number and store it in the database, and every next time do it again and check in the database if the number already exists and if so, generate a new one, check it again, etc. But that doesn't look right, i could be regenerating a number maybe 100 times if the number of generated items gets large.

Is there a smarter way?

EDIT as allways i forgot to say WHY i wanted this, and it will probably make things clearer and maybe get an alternative, and it is: we want to generate an ordernumber for a booking, so we could just use 000001, 000002 etc. But we don't want to give the competitors a clue of how much orders are created (because it's not a high volume market, and we don't want them to know if we are on order 30 after 2 months or at order 100. So we want to have an order number which is random (yet unique)

+1  A: 

For the extremely limited size of your numbers no you cannot expect uniqueness for any type of random generation.

You are generating a 32bit integer, whereas to reach uniqueness you need a much larger number in terms around 128bit which is the size GUIDs use which are guaranteed to always be globally unique.

Chris Marisic
Note that with large enough numbers just the probability of a collision decreases. Theoretically, there is no guarantee of uniqueness, but for practical purposes you can assume that the probability is low enough.
0xA3
Due to the nature of the GUID algorithm theoretically is not relevant to reality.
Chris Marisic
Indeed, given 1. A smart algorithm, and 2. All parties involved using that algorithm It's quite easy to guarantee uniqueness. Point 2 is important though, if someone chooses to use another algorithm then all bets are off. Note that there are multiple algorithms for GUID generation and if you mix them then your GUIDS might not be so GU after all.
Stijn de Witt
+1  A: 

You could put a unique constraint on the column that contains the random number, then handle any constraint voilations by regenerating the number. I think this normally indexes the column as well so this would be faster.

You've tagged the question with C#, so I'm guessing you're using C# to generate the random number. Maybe think about getting the database to generate the random number in a stored proc, and return it.

BG100
Isn't this really the same as what the OP suggested, and suffers the same issue/problem??
andora
@andora: It is an optimization.
0xA3
@andora: This is no other way to do it to guarentee no duplicates is there?
BG100
Yes there is, don't use the random number as the value. Generate your set of possible random numbers with a used flag, select all unused values and select a record at random from that set, set the used flag and update the set. The random element guarantees an unused available value that's just as random as the OP and your approach but without possibly loops generating random number that have been used already.
Lazarus
@Lazarus: Yep! Didn't think of that.
BG100
@Lazarus: like that idea
Michel
@BG100, nice idea, to let sql server do the try-catch behaviour
Michel
+1  A: 

You could just place your numbers in a set. If the size of the set after generation of your N numbers is too small, generate some more.

Do some trial runs. How many numbers do you have to generate on average? Try to find out an optimal solution to the tradeoff "generate too many numbers" / "check too often for duplicates". This optimal is a number M, so that after generating M numbers, your set will likely hold N unique numbers.

Oh, and M can also be calculated: If you need an extra number (your set contains N-1), then the chance of a random number already being in the set is (N-1)/R, with R being the range. I'm going crosseyed here, so you'll have to figure this out yourself (but this kinda stuff is what makes programming fun, no?).

Daren Thomas
If you're going to store all the 99,999,999 possible numbers in some sort of collection and/or table then why not just generate them all sequentially to start with and shuffle them?
LukeH
@LukeH - from OP, I gathered he only wanted a few numbers, small compared to the whole range.
Daren Thomas
@Daren: thats correct, i think i want 1000 numbers a month
Michel
@Michel, but does the uniqueness constraint extend to previous months?
Daren Thomas
+6  A: 

You could build a table with all the possible numbers in it, give the record a 'used' field.

  1. Select all records that have not been 'used'
  2. Pick a random number (r) between 1 and record count
  3. Take record number r
  4. Get your 'random value' from the record
  5. Set the 'used' flag and update the db.

That should be more efficient than picking random numbers, querying the database and repeat until not found as that's just begging for an eternity for the last few values.

Lazarus
Note that this is an O(n^2) algorithm for shuffling n numbers. (Identifying the non-used set is O(n), and you're doing it O(n) times.) There are O(n) and O(n lg n) algorithms for shuffling presented here; if you're going to go to all the trouble of keeping a database table with millions of entries, why not just shuffle it quickly in the first place?
Eric Lippert
This one has set my mind in the final direction. I've done almost the same, only i did the random in code, so i generated 100000 insert statements with random numbers, so in sql i just pick the next one.
Michel
+14  A: 

You can use either an Linear Congruential Generator (LCG) or Linear Feedback Shift Register (LFSR). Google or wikipedia for more info.

Both can, with the right parameters, operate on a 'full-cycle' (or 'full period') basis so that they will generate a 'psuedo-random number' only once in a single period, and generate all numbers within the range. Both are 'weak' generators, so no good for cyptography, but perhaps 'good enough' for apparent randomness. You may have to constrain the period to work within your 'decimal' maximum as having 'binary' periods is necessary.

Update: I should add that it is not necessary to pre-calculate or pre-store previous values in any way, you only need to keep the previous seed-value (single int) and calculate 'on-demand' the next number in the sequence. Of course you can save a chain of pre-calculated numbers to your DB if desired, but it isn't necessary.

andora
A: 

You could try shuffling the set of possible values then using them sequentially.

James
+14  A: 

How about creating a set all of possible numbers and simply randomising the order? You could then just pick the next number from the tail.

Each number appears only once in the set, and when you want a new one it has already been generated, so the overhead is tiny at the point at which you want one. You could do this in memory or the database of your choice. You'll just need a sensible locking strategy for pulling the next available number.

serg10
The required range is 1 to 100 million. That's a 400 Megabyte list using 32 bit ints. Good luck with that.
JeremyP
@Jeremy: My not-particularly-high-spec machine can generate *and* shuffle an array of those 99,999,999 ints in about 20 seconds. (And that's just using some non-optimised LINQ that I quickly cobbled together.)
LukeH
Maybe use a trie for storing the numbers? That should bring down the space needed to store.
abhin4v
+3  A: 

Using this algorithm might be suitable, though it's memory consuming: http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle Put the numbers in the array from 1 to 99999999 and do the shuffle.

Stefan
A: 

I like Lazarus's solution, but if you want to avoid effectively pre-allocating the space for every possible number, just store the used numbers in the table, but build an "unused numbers" list in memory by adding all possible numbers to a collection then deleting every one that's present in the database. Then select one of the remaining numbers and use that, adding it to the list in the database, obviously.

But, like I say, I like Lazaru's solution - I think that's your best bet for most scenarios.

gkrogers
A: 

the stupid way: build a table to record, store all the numble first, and them ,every time the numble used, and flag it as "used"

Matthew Ma
+2  A: 

Use Pseudo-random Number Generators.

For example - Linear Congruential Random Number Generator

(if increment and n are coprime, then code will generate all numbers from 0 to n-1):

    int seed = 1, increment = 3;
    int n = 10;

    int x = seed;
    for(int i = 0; i < n; i++)
    {
        x = (x + increment) % n;
        Console.WriteLine(x);
    }

Output: 4 7 0 3 6 9 2 5 8 1

Basic Random Number Generators

Mersenne Twister

Branimir
too many magic constants in your code!!!
ammoQ
Yes, you're right, I rewrite sample. Is it better now?
Branimir
Increase n to 15 and the number 4 will occur three times. It appears that it's not enough for the seed and n to be coprime.
Jonas Elfström
Typo, "increment" and "n" need to be coprime.
Branimir
clever math but it if i'm not mistaken they'll always come out in arithmetic progressions
eds
Yes, this is the simplest generator algorithm
Branimir
@jonas: see wikipedia for details of selecting constants for LCG. there are other constraints for choosing the best constants.
andora
I checked out http://en.wikipedia.org/wiki/Linear_congruential_generator but I must admit that it isn't immediately obvious to me if it's easy or quite hard to fulfill all three criteria.
Jonas Elfström
Wow, what an answer. looked at the code a dozen of times, still don't understand it. It works however :)
Michel
A: 
System.Random rnd = new System.Random();
IEnumerable<int> numbers = Enumerable.Range(0, 99999999).OrderBy(r => rnd.Next());

This gives a randomly shuffled collection of ints in your range. You can then iterate through the collection in order.

The nice part about this is that you're not actually creating the entire collection in memory.

See comments below - this will generate the entire collection in memory when you iterate to the first element.

mbeckish
Can you explain why you believe this does not create the entire collection in memory? (Sorting requires at minimum identifying the smallest sort key; doing so requires looking at all of them.) I am interested in why people believe false things because it helps us design languages and libraries better. I assure you that not only does this create the entire collection in memory, it shuffles it in n lg n time. The fischer-yates shuffle on the other hand works in O(n) time and need not shuffle the entire collection. (Though the entire collection must be present in memory.)
Eric Lippert
How do you get numbers out of it without getting an out of memory exception? `var number = numbers.First();` bombs.
Jonas Elfström
@Eric Lippert - My mistake. I misinterpreted this from the description of OrderBy: "This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic."
mbeckish
@Eric Lippert - I thought this meant that only the query was generated up front.
mbeckish
@Eric Lippert - If you only used Enumerable.Range, and didn't call OrderBy, would the entire collection still be generated upon calling First()?
mbeckish
To answer your first question: yes, the work is deferred, but so what? Just because a bomb doesn't explode immediately does not make it safe to trigger. The moment anyone asks for the work to be done the all the unsorted keys, generated values and sorted results will be created, and the latter two have to be all present in memory at once. (800 million bytes of doubles!) To answer your second question: no. Getting the first element only requires getting the first element. But getting the first sorted element requires getting all of them, so that you know which one comes first.
Eric Lippert
You make a good point about the documentation being somewhat misleading. I'll mention that to the documentation manager next time I see her. Thanks!
Eric Lippert
@Eric Lippert - I was under the mistaken impression that the point of these deferred execution, LINQ-y methods was so that each element wouldn't be generated until it was needed. In general, how does one know which queries will generate the entire collection upon first access of an element? Should we just use common sense to realize that all elements would need to be scanned to implement the query, or is there documentation that we can refer to?
mbeckish
You are correct; the work is deferred until it is *needed*. And getting the first element of a sorted list *needs to look at all of them*. If you can figure out how to find the smallest element in a list without looking at all of them, I'd love to see it! To answer your question about documentation: as you've noted, the documentation is not as good as it could be. The implementations are not perfect either, but they're pretty good. But in general, use common sense: groupings, sorts, minima, maxima, and joins need to look at everything before they can know the first element.
Eric Lippert
+1  A: 

You could try giving writing usernames by using a starting number and an incremental number. You start at a number (say, 12000), then, for each account created, the number goes up by the incremental value.

id = startValue + (totalNumberOfAccounts * inctrementalNumber)

If incrementalNumber is a prime value, you should be able to loop around the max account value and not hit another value. This creates the illusion of a random id, but should also have very little conflicts. In the case of a conflicts, you could add a number to increase when there's a conflict, so the above code becomes. We want to handle this case, since, if we encounter one account value that is identical, when we increment, we will bump into another conflict when we increment again.

id = startValue + (totalNumberOfAccounts * inctrementalNumber) + totalConflicts
VerticalEvent
+2  A: 

In case you happen to have access to a library and you want to dig into and understand the issue well, take a look at

The Art of Computer Programming, Volume 2: Seminumerical Algorithms

by Donald E. Knuth. Chapter 3 is all about random numbers.

Rune
A: 
function getShuffledNumbers(count) {
 var shuffledNumbers = new Array();
 var choices = new Array();
 for (var i = 0; i<count; i++) {
  // choose a number between 1 and amount of numbers remaining
  choices[i] = selectedNumber = Math.ceil(Math.random()*(99999999 - i));
  // Now to figure out the number based on this selection, work backwards until
  // you figure out which choice this number WOULD have been on the first step
  for (var j = 0; j < i; j++) {
   if (choices[i - 1 - j] >= selectedNumber) {
    // This basically says "it was choice number (selectedNumber) on the last step,
    // but if it's greater than or equal to this, it must have been choice number
    // (selectedNumber + 1) on THIS step."
    selectedNumber++;
   }
  }
  shuffledNumbers[i] = selectedNumber;
 }
 return shuffledNumbers;
}

This is as fast a way I could think of and only uses memory as it needs, however if you run it all the way through it will use double as much memory because it has two arrays, choices and shuffledNumbers.

eds
A: 

Running a linear congruential generator once to generate each number is apt to produce rather feeble results. Running it through a number of iterations which is relatively prime to your base (100,000,000 in this case) will improve it considerably. If before reporting each output from the generator, you run it through one or more additional permutation functions, the final output will still be a duplicate-free permutation of as many numbers as you want (up to 100,000,000) but if the proper functions are chosen the result can be cryptographically strong.

supercat
A: 
  1. create and store ind db two shuffled versions(SHUFFLE_1 and SHUFFLE_2) of the interval [0..N), where N=10'000;

  2. whenever a new order is created, you assign its id like this:

ORDER_FAKE_INDEX = N*SHUFFLE_1[ORDER_REAL_INDEX / N] + SHUFFLE_2[ORDER_REAL_INDEX % N]

Andy
+1  A: 

I've had to do something like this before (create a "random looking" number for part of a URL). What I did was create a list of keys randomly generated (max of 62 (a-z, A-Z, 0-9)). Each time it needed a new number it simply randomly selected a number from keys.Count and XOR the key and the given sequence number, then outputted XORed value (in base 62) prefixed with the keys index (in base 62). I also check the output to ensure it does not contain any naught words. If it does simply take the next key and have a second go. Decrypting the number is equally simple (the first digit is the index to the key to use, a simple XOR and you are done).

I like andora's answer if you are generating new numbers and might have used it had I known.

mlk