ansaurus

Question

smart way to generate unique random number

Answer 1

+1 A:

For the extremely limited size of your numbers no you cannot expect uniqueness for any type of random generation.

You are generating a 32bit integer, whereas to reach uniqueness you need a much larger number in terms around 128bit which is the size GUIDs use which are guaranteed to always be globally unique.

Chris Marisic 2010-09-02 12:25:37

Note that with large enough numbers just the probability of a collision decreases. Theoretically, there is no guarantee of uniqueness, but for practical purposes you can assume that the probability is low enough.

0xA3 2010-09-02 12:32:13

Due to the nature of the GUID algorithm theoretically is not relevant to reality.

Chris Marisic 2010-09-02 12:42:16

Indeed, given 1. A smart algorithm, and 2. All parties involved using that algorithm It's quite easy to guarantee uniqueness. Point 2 is important though, if someone chooses to use another algorithm then all bets are off. Note that there are multiple algorithms for GUID generation and if you mix them then your GUIDS might not be so GU after all.

Stijn de Witt 2010-09-03 12:12:27

Answer 2

+1 A:

You could put a unique constraint on the column that contains the random number, then handle any constraint voilations by regenerating the number. I think this normally indexes the column as well so this would be faster.

You've tagged the question with C#, so I'm guessing you're using C# to generate the random number. Maybe think about getting the database to generate the random number in a stored proc, and return it.

BG100 2010-09-02 12:25:53

Isn't this really the same as what the OP suggested, and suffers the same issue/problem??

andora 2010-09-02 12:30:11

@andora: It is an optimization.

0xA3 2010-09-02 12:32:50

@andora: This is no other way to do it to guarentee no duplicates is there?

BG100 2010-09-02 12:33:17

Yes there is, don't use the random number as the value. Generate your set of possible random numbers with a used flag, select all unused values and select a record at random from that set, set the used flag and update the set. The random element guarantees an unused available value that's just as random as the OP and your approach but without possibly loops generating random number that have been used already.

Lazarus 2010-09-02 12:36:17

@Lazarus: Yep! Didn't think of that.

BG100 2010-09-02 12:41:40

@Lazarus: like that idea

Michel 2010-09-03 10:12:15

@BG100, nice idea, to let sql server do the try-catch behaviour

Michel 2010-09-03 10:14:34

Answer 3

+1 A:

You could just place your numbers in a set. If the size of the set after generation of your N numbers is too small, generate some more.

Do some trial runs. How many numbers do you have to generate on average? Try to find out an optimal solution to the tradeoff "generate too many numbers" / "check too often for duplicates". This optimal is a number M, so that after generating M numbers, your set will likely hold N unique numbers.

Oh, and M can also be calculated: If you need an extra number (your set contains N-1), then the chance of a random number already being in the set is (N-1)/R, with R being the range. I'm going crosseyed here, so you'll have to figure this out yourself (but this kinda stuff is what makes programming fun, no?).

Daren Thomas 2010-09-02 12:29:28

If you're going to store all the 99,999,999 possible numbers in some sort of collection and/or table then why not just generate them all sequentially to start with and shuffle them?

LukeH 2010-09-02 14:30:12

@LukeH - from OP, I gathered he only wanted a few numbers, small compared to the whole range.

Daren Thomas 2010-09-02 14:51:06

@Daren: thats correct, i think i want 1000 numbers a month

Michel 2010-09-03 10:13:06

@Michel, but does the uniqueness constraint extend to previous months?

Daren Thomas 2010-09-03 11:43:18

Answer 4

+6 A:

You could build a table with all the possible numbers in it, give the record a 'used' field.

Select all records that have not been 'used'
Pick a random number (r) between 1 and record count
Take record number r
Get your 'random value' from the record
Set the 'used' flag and update the db.

That should be more efficient than picking random numbers, querying the database and repeat until not found as that's just begging for an eternity for the last few values.

Lazarus 2010-09-02 12:31:22

Note that this is an O(n^2) algorithm for shuffling n numbers. (Identifying the non-used set is O(n), and you're doing it O(n) times.) There are O(n) and O(n lg n) algorithms for shuffling presented here; if you're going to go to all the trouble of keeping a database table with millions of entries, why not just shuffle it quickly in the first place?

Eric Lippert 2010-09-02 14:09:22

This one has set my mind in the final direction. I've done almost the same, only i did the random in code, so i generated 100000 insert statements with random numbers, so in sql i just pick the next one.

Michel 2010-09-11 19:57:56

Answer 5

+14 A:

You can use either an Linear Congruential Generator (LCG) or Linear Feedback Shift Register (LFSR). Google or wikipedia for more info.

Both can, with the right parameters, operate on a 'full-cycle' (or 'full period') basis so that they will generate a 'psuedo-random number' only once in a single period, and generate all numbers within the range. Both are 'weak' generators, so no good for cyptography, but perhaps 'good enough' for apparent randomness. You may have to constrain the period to work within your 'decimal' maximum as having 'binary' periods is necessary.

Update: I should add that it is not necessary to pre-calculate or pre-store previous values in any way, you only need to keep the previous seed-value (single int) and calculate 'on-demand' the next number in the sequence. Of course you can save a chain of pre-calculated numbers to your DB if desired, but it isn't necessary.

andora 2010-09-02 12:37:21

Answer 6

A:

You could try shuffling the set of possible values then using them sequentially.

James 2010-09-02 12:37:33

Answer 7

+14 A:

How about creating a set all of possible numbers and simply randomising the order? You could then just pick the next number from the tail.

Each number appears only once in the set, and when you want a new one it has already been generated, so the overhead is tiny at the point at which you want one. You could do this in memory or the database of your choice. You'll just need a sensible locking strategy for pulling the next available number.

serg10 2010-09-02 12:38:48

The required range is 1 to 100 million. That's a 400 Megabyte list using 32 bit ints. Good luck with that.

JeremyP 2010-09-02 14:11:18

@Jeremy: My not-particularly-high-spec machine can generate *and* shuffle an array of those 99,999,999 ints in about 20 seconds. (And that's just using some non-optimised LINQ that I quickly cobbled together.)

LukeH 2010-09-02 14:28:41

Maybe use a trie for storing the numbers? That should bring down the space needed to store.

abhin4v 2010-09-02 14:41:28

Answer 8

+3 A:

Using this algorithm might be suitable, though it's memory consuming: http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle Put the numbers in the array from 1 to 99999999 and do the shuffle.

Stefan 2010-09-02 12:39:23

Answer 9

A:

I like Lazarus's solution, but if you want to avoid effectively pre-allocating the space for every possible number, just store the used numbers in the table, but build an "unused numbers" list in memory by adding all possible numbers to a collection then deleting every one that's present in the database. Then select one of the remaining numbers and use that, adding it to the list in the database, obviously.

But, like I say, I like Lazaru's solution - I think that's your best bet for most scenarios.

gkrogers 2010-09-02 12:41:13

Answer 10

A:

the stupid way: build a table to record, store all the numble first, and them ,every time the numble used, and flag it as "used"

Matthew Ma 2010-09-02 12:45:11

Answer 11

+2 A:

Use Pseudo-random Number Generators.

For example - Linear Congruential Random Number Generator

(if increment and n are coprime, then code will generate all numbers from 0 to n-1):

    int seed = 1, increment = 3;
    int n = 10;

    int x = seed;
    for(int i = 0; i < n; i++)
    {
        x = (x + increment) % n;
        Console.WriteLine(x);
    }

Output: 4 7 0 3 6 9 2 5 8 1

Basic Random Number Generators

Mersenne Twister

Branimir 2010-09-02 12:52:20

too many magic constants in your code!!!

ammoQ 2010-09-02 13:37:48

Yes, you're right, I rewrite sample. Is it better now?

Branimir 2010-09-02 13:55:55

Increase n to 15 and the number 4 will occur three times. It appears that it's not enough for the seed and n to be coprime.

Jonas Elfström 2010-09-02 14:49:32

Typo, "increment" and "n" need to be coprime.

Branimir 2010-09-02 14:58:48

clever math but it if i'm not mistaken they'll always come out in arithmetic progressions

eds 2010-09-02 21:00:42

Yes, this is the simplest generator algorithm

Branimir 2010-09-03 08:02:04

@jonas: see wikipedia for details of selecting constants for LCG. there are other constraints for choosing the best constants.

andora 2010-09-10 20:24:11

I checked out http://en.wikipedia.org/wiki/Linear_congruential_generator but I must admit that it isn't immediately obvious to me if it's easy or quite hard to fulfill all three criteria.

Jonas Elfström 2010-09-10 21:23:20

Wow, what an answer. looked at the code a dozen of times, still don't understand it. It works however :)

Michel 2010-09-11 19:55:42

Answer 12

A:

System.Random rnd = new System.Random();
IEnumerable<int> numbers = Enumerable.Range(0, 99999999).OrderBy(r => rnd.Next());

This gives a randomly shuffled collection of ints in your range. You can then iterate through the collection in order.

~~The nice part about this is that you're not actually creating the entire collection in memory.~~

See comments below - this will generate the entire collection in memory when you iterate to the first element.

mbeckish 2010-09-02 12:57:03

Can you explain why you believe this does not create the entire collection in memory? (Sorting requires at minimum identifying the smallest sort key; doing so requires looking at all of them.) I am interested in why people believe false things because it helps us design languages and libraries better. I assure you that not only does this create the entire collection in memory, it shuffles it in n lg n time. The fischer-yates shuffle on the other hand works in O(n) time and need not shuffle the entire collection. (Though the entire collection must be present in memory.)

Eric Lippert 2010-09-02 14:03:30

How do you get numbers out of it without getting an out of memory exception? `var number = numbers.First();` bombs.

Jonas Elfström 2010-09-02 14:15:36

@Eric Lippert - My mistake. I misinterpreted this from the description of OrderBy: "This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic."

mbeckish 2010-09-02 14:34:58

@Eric Lippert - I thought this meant that only the query was generated up front.

mbeckish 2010-09-02 14:37:44

@Eric Lippert - If you only used Enumerable.Range, and didn't call OrderBy, would the entire collection still be generated upon calling First()?

mbeckish 2010-09-02 14:45:48

To answer your first question: yes, the work is deferred, but so what? Just because a bomb doesn't explode immediately does not make it safe to trigger. The moment anyone asks for the work to be done the all the unsorted keys, generated values and sorted results will be created, and the latter two have to be all present in memory at once. (800 million bytes of doubles!) To answer your second question: no. Getting the first element only requires getting the first element. But getting the first sorted element requires getting all of them, so that you know which one comes first.

Eric Lippert 2010-09-02 14:53:25

You make a good point about the documentation being somewhat misleading. I'll mention that to the documentation manager next time I see her. Thanks!

Eric Lippert 2010-09-02 14:55:24

@Eric Lippert - I was under the mistaken impression that the point of these deferred execution, LINQ-y methods was so that each element wouldn't be generated until it was needed. In general, how does one know which queries will generate the entire collection upon first access of an element? Should we just use common sense to realize that all elements would need to be scanned to implement the query, or is there documentation that we can refer to?

mbeckish 2010-09-02 15:03:56

You are correct; the work is deferred until it is *needed*. And getting the first element of a sorted list *needs to look at all of them*. If you can figure out how to find the smallest element in a list without looking at all of them, I'd love to see it! To answer your question about documentation: as you've noted, the documentation is not as good as it could be. The implementations are not perfect either, but they're pretty good. But in general, use common sense: groupings, sorts, minima, maxima, and joins need to look at everything before they can know the first element.

Eric Lippert 2010-09-02 15:20:29

Answer 13

+1 A:

You could try giving writing usernames by using a starting number and an incremental number. You start at a number (say, 12000), then, for each account created, the number goes up by the incremental value.

id = startValue + (totalNumberOfAccounts * inctrementalNumber)

If incrementalNumber is a prime value, you should be able to loop around the max account value and not hit another value. This creates the illusion of a random id, but should also have very little conflicts. In the case of a conflicts, you could add a number to increase when there's a conflict, so the above code becomes. We want to handle this case, since, if we encounter one account value that is identical, when we increment, we will bump into another conflict when we increment again.

id = startValue + (totalNumberOfAccounts * inctrementalNumber) + totalConflicts

VerticalEvent 2010-09-02 13:19:43

Answer 14

+2 A:

In case you happen to have access to a library and you want to dig into and understand the issue well, take a look at

The Art of Computer Programming, Volume 2: Seminumerical Algorithms

by Donald E. Knuth. Chapter 3 is all about random numbers.

Rune 2010-09-02 13:46:57

Answer 15

A:

function getShuffledNumbers(count) {
 var shuffledNumbers = new Array();
 var choices = new Array();
 for (var i = 0; i<count; i++) {
  // choose a number between 1 and amount of numbers remaining
  choices[i] = selectedNumber = Math.ceil(Math.random()*(99999999 - i));
  // Now to figure out the number based on this selection, work backwards until
  // you figure out which choice this number WOULD have been on the first step
  for (var j = 0; j < i; j++) {
   if (choices[i - 1 - j] >= selectedNumber) {
    // This basically says "it was choice number (selectedNumber) on the last step,
    // but if it's greater than or equal to this, it must have been choice number
    // (selectedNumber + 1) on THIS step."
    selectedNumber++;
   }
  }
  shuffledNumbers[i] = selectedNumber;
 }
 return shuffledNumbers;
}

This is as fast a way I could think of and only uses memory as it needs, however if you run it all the way through it will use double as much memory because it has two arrays, choices and shuffledNumbers.

eds 2010-09-02 17:52:11

Answer 16

A:

Running a linear congruential generator once to generate each number is apt to produce rather feeble results. Running it through a number of iterations which is relatively prime to your base (100,000,000 in this case) will improve it considerably. If before reporting each output from the generator, you run it through one or more additional permutation functions, the final output will still be a duplicate-free permutation of as many numbers as you want (up to 100,000,000) but if the proper functions are chosen the result can be cryptographically strong.

supercat 2010-09-02 20:22:46

Answer 17

A:

create and store ind db two shuffled versions(SHUFFLE_1 and SHUFFLE_2) of the interval [0..N), where N=10'000;
whenever a new order is created, you assign its id like this:

ORDER_FAKE_INDEX = N*SHUFFLE_1[ORDER_REAL_INDEX / N] + SHUFFLE_2[ORDER_REAL_INDEX % N]

Andy 2010-09-03 10:44:53

Answer 18

+1 A:

I've had to do something like this before (create a "random looking" number for part of a URL). What I did was create a list of keys randomly generated (max of 62 (a-z, A-Z, 0-9)). Each time it needed a new number it simply randomly selected a number from keys.Count and XOR the key and the given sequence number, then outputted XORed value (in base 62) prefixed with the keys index (in base 62). I also check the output to ensure it does not contain any naught words. If it does simply take the next key and have a second go. Decrypting the number is equally simple (the first digit is the index to the key to use, a simple XOR and you are done).

I like andora's answer if you are generating new numbers and might have used it had I known.

mlk 2010-09-03 11:11:32

ansaurus

tags:

views:

answers:

smart way to generate unique random number

related questions