tags:

views:

507

answers:

21

Possible Duplicates:
Unique random numbers in O(1)?
How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N

I want to generate random number in a certain diapason, and I must be sure, that each new number is not a duplicate of formers. One solution is to store formerly generated numbers in a container and each new number checks aginst the container. If there is such number in the container, then we generate agin, else we use and add it to the container. But with each new number this operation is becoming slower and slower. Is there any better approach, or any rand function that can work faster and ensure uniqueness of the generation?

EDIT: Yes, there is a limit (for example from 0 to 1.000.000.000). But I want to generate 100.000 unique numbers! (Would be great if the solution will be by using Qt features.)

+2  A: 

It wouldn't be random if there is such a pattern?

As far as I know you would have to store and filter all unwanted numbers...

FloE
+15  A: 

Is there a range for the random numbers? If you have a limit for random numbers and you keep generating unique random numbers, then you'll end up with a list of all numbers from x..y in random order, where x-y is the valid range of your random numbers. If this is the case, you might improve speed greatly by simply generating the list of all numbers x..y and shuffling it, instead of generating the numbers.

jkramer
+8  A: 

Fill out a list with the numbers you need, then shuffle the list and pick your numbers from one end.

kasperjj
not very practical if he needs 32 bit numbers.
shoosh
He needs an index of all used numbers anyway.
GvS
true, but if that's truly his need, he should just start from 1 and iterate his way up while accepting that at least in theory, that's as random as any other order ;-)
kasperjj
@shoosh: the number of _bits_ doesn't matter, it's the number of _numbers_ that matters. (Why would that be a problem if I need 5 32bit numbers?)
sbi
@sbi: the proble is if he needs every number in the range of every possible 32 bit number
klez
This would be a lot faster than the original solution, and I can't think of many better ways. Perhaps if the numbers are very large you could map them to smaller numbers, or you could build a tree structure, much like you can roll a D10 with a D6 using lower/higher spreads (1-3 means 1-5 ignore 6, 4-6 means 6-10 ignore 6)
SLC
@klez, for bits that take exactly two values.
shoosh
@kasperjj those were random.. till you said it
Hernán Eche
@klez: Yes, that would indeed be a problem. Except that I don't see it mentioned as a requirement.
sbi
+1  A: 

If they can't be repeated, they aren't random.

EDIT:

Furthermore..

if they can't be repeated, they don't fit in a finite computer

Hernán Eche
That's actually not true. you can define "random" as "random from the set of numbers that doesn't include the last number"
shoosh
you can redefine even the word redefine.
Hernán Eche
@Hernán: So the randomness is limited. However that doesn't mean it's not random anymore. The number of bits in your random numbers limits their randomness, too. Still, you wouldn't say they aren't random jsut because you can't have more than 2^16 numbers.
sbi
@sbi, there is a difference between "can't be repeated" and "limited range", I have some old (six sides, limited) dice, and they are still working, if the number couldn't be repeated, it's probable I couldn't use them anymore in a week, anyway of course you are right that a limited range could be random, and I was a little ironic in mention the finite computer
Hernán Eche
@Hernán: There's many [lotteries where every number can only be drawn once](http://www.youtube.com/watch?v=7upsy2t442w). You will surely agree that their numbers are random nevertheless?
sbi
@sbi: The approach gives the appearance of randomness, which may be all Narek needs. But as more numbers in the lottery are drawn, they are less random. Just like drawing from a deck of cards; only the first card drawn is random; the rest are dependent on previous draws.
jason.rickman
Shuffle a deck of cards. Now without looking, would you say that the top card in the deck is random? The bottom one? Certainly both are random. This does not change if you peek at the top card, as the order remains unchanged and therefore still random.
MSalters
@jason.rickman: Did you notice you wrote _"less_ random"? See my first comment above.
sbi
@sbi: Sorry, by "less random" I mean "more predictable". My argument was by reducing your domain space on every pull, you are reducing your entropy (i.e. increasing the certainty in the result). In a deck of 52 cards, once you have drawn 51, you are 100% certain what that last card will be.
jason.rickman
There is a difference between emitting a set of numbers in random order, and emitting random numbers. I believe Narek wants the first, not the second. The contention in this comment thread that the numbers in that case are not truly random.
jason.rickman
@jason.rickman: Yes, I know that "less random" means "more predictable". But it doesn't mean "not random" and it doesn't mean "predictable". Which is exactly what I wrote: "the randomness is limited". As for the amount: With a deck of cards, you might have ~2^6 values to pick from, with 32bit integers it's 2^32, and with what Narek needs probably something in between. Nevertheless, it's still random. (And as for picking the last of 52 cards: That's only a problem if you need to pick 52 cards. if you pick 5 out of 52 the last one is still less predictable than the result of two dices.)
sbi
+1  A: 

You could store the numbers in a vector, and get them by index (1..n-1). After each random generation, remove the indexed number from the vector, then generate the next number in the interval 1..n-2. etc.

Péter Török
A: 

How many random numbers do you need? Maybe you can apply a shuffle algorithm to a precalculated array of random numbers?

tanascius
A: 

There is no way a random generator will output values depending on previously outputted values, because they wouldn't be random. However, you can improve performance by using different pools of random values each with values combined by a different salt value, which will divide the quantity of numbers to check by the quantity of pools you have.

kbok
Actually, in software we only have pseudo-random number generators, and for them every number generated does indeed depend on the previously generated number.
sbi
A: 

If the range of the random number doesn't matter you could use a really large range of random numbers and hope you don't get any collisions. If your range is billions of times larger than the number of elements you expect to create your chances of a collision are small but still there. If the numbers don't to have an actual random distribution you could have a two part number {counter}{random x digits} that would ensure a unique number but it wouldn't be randomly distributed.

Joshua
A: 

There's not going to be a pure functional approach that isn't O(n^2) on the number of results returned so far - every time a number is generated you will need to check against every result so far. Additionally, think about what happens when you're returning e.g. the 1000th number out of 1000 - you will require on average 1000 tries until the random algorithm comes up with the last unused number, with each attempt requiring an average of 499.5 comparisons with the already-generated numbers.

It should be clear from this that your description as posted is not quite exactly what you want. The better approach, as others have said, is to take a list of e.g. 1000 numbers upfront, shuffle it, and then return numbers from that list incrementally. This will guarantee you're not returning any duplicates, and return the numbers in O(1) time after the initial setup.

Andrzej Doyle
+2  A: 
unsigned int N = 1000;
vector <unsigned int> vals(N);
for(unsigned int i = 0; i < vals.size(); ++i)
   vals[i] = i;
std::random_shuffle(vals.begin(), vals.end());

unsigned int random_number_1 = vals[0];
unsigned int random_number_2 = vals[1];
unsigned int random_number_3 = vals[2];
//etc
Viktor Sehr
A: 

Use Hashset generic class . This class does not contain same values. You can put in all of your generated numbers then u can use them in Hashset.You can also check it if it is exist or not .Hashset can determine existence of items in fastest way.Hashset does not slow when list become bigger and this is biggest feature of it.

For example :

HashSet<int> array = new HashSet<int>();
            array.Add(1);
            array.Add(2);
            array.Add(1);
            foreach (var item in array)
            {
                Console.WriteLine(item);
            }
            Console.ReadKey();
Freshblood
Nice solution. I would have given my +1 if this was `C++` as the OP requested.
ereOn
In my chess engine i created a hashtable and as expected algorithm should not be slow when adding new items as List generic class.Then i realized Hashset comes with 4.0 if i am not wrong. You can also create one if not exist in c++.
Freshblood
I haven't realized this question asked in c++ tags :) now i realized.
Freshblood
In C++, we prefer stack objects over heap objects, a hash set is spelled `std::unordered_set<>`, adding to it is done using its `insert()` member function, there is no `foreach` loop (yet), and we write lines to the console by doing `std::cout << item << '\n';` and read using `std::getline()`. But apart from every single line containing at least one error, it's a solid foundation for valid solution (where does the random number generator come in, though?), so I won't down-vote it. `:)`
sbi
A: 

You can allocate enough memory for array of bits with 1 bit for each possible number. and check/set bits for every generated number. for example for numbers from 0 to 65535 you will need only 8192 (8kb) of memory.

mOlind
A: 

Previously.

plinth
Pointing out a possible dupe both as an answer and as a comment on the question?
Mike DeSimone
+10  A: 

I think there are 3 possible approaches, depending on range-size, and performance pattern needed you can use another algorithm.

  1. Create a random number, see if it is in (a sorted) list. If not add and return, else try another.
    • Your list will grow and consume memory with every number you need. If every number is 32 bit, it will grow with at least 32 bits every time.
    • Every new random number increases the hit-ratio and this will make it slower.
    • O(n^2) - I think
  2. Create an bit-array for every number in the range. Mark with 1/True if already returned.
    • Every number now only takes 1 bit, this can still be a problem if the range is big, but every number now only allocates 1 bit.
    • Every new random number increases the hit-ratio and this will make it slower.
    • O(n*2)
  3. Pre-populate a list with all the numbers, shuffle it, and return the Nth number.
    • The list will not grow, returning numbers will not get slower,
    • but generating the list might take a long time, and a lot of memory.
    • O(1)

Depending on needed speed, you could store all lists in a database. There's no need for them to be in memory except speed.

GvS
A: 

Here's an interesting solution I came up with:

Assume you have numbers 1 to 1000 - and you don't have enough memory.

You could put all 1000 numbers into an array, and remove them one by one, but you'll get memory overflow error.

You could split the array in two, so you have an array of 1-500 and one empty array

You could then check if the number exists in array 1, or doesn't exist in the second array.

So assuming you have 1000 numbers, you can get a random number from 1-1000. If its less than 500, check array 1 and remove it if present. If it's NOT in array 2, you can add it.

This halves your memory usage.

If you propogate this using recursion, you can split your 500 array into a 250 and empty array.

Assuming empty arrays use no space, you can decrease your memory usage quite a bit.

Searching will be massively faster too, because if you break it down a lot, you generate a number such as 29. It's less than 500, less than 250, less than 125, less than 62, less than 31, greater than 15, so you do those 6 calculations, then check the array containing an average of 16/2 items - 8 in total.

I should patent this search, although I bet it already exists!

SLC
If your first 500 numbers are all greater than 500, then you end up with your first list is full because you never removed any elements, and your second list is full because you added all the elements. So you're back to not having enough memory.
Dennis Zickefoose
+4  A: 

If you use a simple 32-bit linear congruential RNG (such as the so-called "Minimal Standard"), all you have to do is store the seed value you use and compare each generated number to it. If you ever reach that value again, your sequence is starting to repeat itself and you're out of values. This is O(1), but of course limited to 2^32-1 values (though I suppose you could use a 64-bit version as well).

Drew Hall
Personally I think this is the best answer here for 2 reasons: 1) is that it's the most efficient in terms of time and space and 2) it indicates that the solution to this is a function of the random number generator used.
sashang
A: 

Especially given the desired number of values, you want a Linear Feedback Shift Register.

Why?

No shuffle step, nor a need to keep track of values you've already hit. As long as you go less than the full period, you should be fine.

It turns out that the Wikipedia article has some C++ code examples which are more tested than anything I would give you off the top of my head. Note that you'll want to be pulling values from inside the loops -- the loops just iterate the shift register through. You can see this in the snippet here.

(Yes, I know this was mentioned, briefly in the dupe -- saw it as I was revising. Given it hasn't been brought up here and is the best way to solve the poster's question, I think it should be brought up again.)

Arthur Shipkowski
A: 

Let's say size=100.000 then create an array with this size. Create random numbers then put them into array.Problem is which index that number will be ? randomNumber%size will give you index.

When u put next number, use that function for index and check this value is exist or not. If not exist put it if exist then create new number and try that. U can create in fastest way with this way. Disadvange of this way is you will never find numbers which last section is same.

For example for last sections is 1231232444556 3458923444556

you will never have such numbers in your list even if they are totally different but last sections are same.

Freshblood
+2  A: 

There is a class of pseudo-random number generators that, I believe, has the properties you want: the Linear congruential generator. If defined properly, it will produce a list of integers from 0 to N-1, with no two numbers repeating until you've used all of the numbers in the list once.

#include <stdint.h>

/*
 * Choose these values as follows:
 *
 * The MODULUS and INCREMENT must be relatively prime.
 * The MULTIPLIER-1 must be divisible by all prime factors of the MODULUS.
 * The MULTIPLIER-1 must be divisible by 4, if the MODULUS is divisible by 4.
 *
 * In addition, modulus must be <= 2**32 (0x0000000100000000ULL).
 *
 * A small example would be 8, 5, 3.
 * A larger example would be 256, 129, 251.
 * A useful example would be 0x0000000100000000ULL, 1664525, 1013904223.
 */

#define MODULUS    (0x0000000100000000ULL)
#define MULTIPLIER (1664525)
#define INCREMENT  (1013904223)

static uint64_t seed;

uint32_t lcg( void ) {
    uint64_t temp;

    temp = seed * MULTIPLIER + INCREMENT;   // 64-bit intermediate product
    seed = temp % MODULUS;                  // 32-bit end-result

    return (uint32_t) seed;
}

All you have to do is choose a MODULUS such that it is larger than the number of numbers you'll need in a given run.

Craig Trader
A: 

First off, there's a huge difference between random and pseudorandom. There's no way to generate perfectly random numbers from a deterministic process (such as a computer) without bringing in some physical process like latency between keystrokes or another entropy source.

The approach of saving all the numbers generated will slow down the computation rather quickly; the more numbers you have, the larger your storage needs, until you've filled up all available memory. A better method would be (as someone's already suggested) using a well known pseudorandom number generator such as the Linear Congruential Generator; it's super fast, requiring only modular multiplication and addition, and the theory behind it gets a lot of mention in Vol. 2 of Knuth's TAOCP. That way, the theory involved guarantees a rather large period before repetition, and the only storage needed are the parameters and seed used.

Graham Enos
There's a large period before the *seed* of the PRNG repeats. The output will repeat far more often.
Ben Voigt
Depends on which generator you use, and if you're generating integers or floats between 0 and 1 with limited precision. The LCG is set up to specifically, with proper parameters, generate all integers mod a certain modulus.
Graham Enos
A: 

If you have no problem when a value can be calculated by the previous one, LFSR and LCG are fine. When you don't want that one output value can be calculated by another, you can use a block cipher in counter mode to generate the output sequence, given that the cipher block length is equal to the output length.

Rudi