I would like to get 2 random different elements from an std::vector. How can I do this so that:
- It is fast (it is done thousands of times in my algorithm)
- It is elegant
- The elements selection is really uniformly distributed
I would like to get 2 random different elements from an std::vector. How can I do this so that:
Not elegant, but extreamly simple: just draw a random number in [0, vector.size()[ and check it's not twice the same.
Simplicity is also in some way elegance ;)
What do you call fast ? I guess this can be done thousands of times within a millisecond.
For elegance and simplicty:
void Choose (const int size, int &first, int &second)
{
// pick a random element
first = rand () * size / MAX_RAND;
// pick a random element from what's left (there is one fewer to choose from)...
second = rand () * (size - 1) / MAX_RAND;
// ...and adjust second choice to take into account the first choice
if (second >= first)
{
++second;
}
}
using first and second to index the vector.
For uniformness, this is very tricky since as size approaches RAND_MAX there will be a bias towards the lower values and if size exceeds RAND_MAX then there will be elements that are never chosen. One solution to overcome this is to use a binary search:
int GetRand (int size)
{
int lower = 0, upper = size;
do
{
int mid = (lower + upper) / 2;
if (rand () > RAND_MAX / 2) // not a great test, perhaps use parity of rand ()?
{
lower = mid;
}
else
{
upper = mid;
}
} while (upper != lower); // this is just to show the idea,
// need to cope with lower == mid and lower != upper
// and all the other edge conditions
return lower;
}
How about using a std::queue
and doing std::random_shuffle
on them. Then just pop til your hearts content?
Whenever need something random, you are going to have various questions about the random number properties regarding uniformity, distribution and so on.
Assuming you've found a suitable source of randomness for your application, then the simplest way to generate pairs of uncorrelated entries is just to pick two random indexes and test them to ensure they aren't equal.
Given a vector of N+1 entries, another option is to generate an index i in the range 0..N. element[i] is choice one. Swap elements i and N. Generate an index j in the range 0..(N-1). element[j] is your second choice. This slowly shuffles your vector which may be problematical, but it can be avoided by using a second vector which holds indexes into the first, and shuffling that. This method trades a swap for the index comparison and tends to be more efficient for small vectors (a dozen or fewer elements, typically) as it avoids having to do multiple comparisons as the number of collisions increase.
You might wanna look into the gnu scientific library. There are some pretty nice random number generators in there that are guaranteed to be random down to the bit level.
What you need is to generate M uniformly distributed random numbers from [0, N) range, but there is one caveat here.
One needs to note that your statement of the problem is ambiguous. What is meant by the uniformly distributed selection? One thing is to say that each index has to be selected with equal probability (of M/N, of course). Another thing is to say that each two-index combination has to be selected with equal probability. These two are not the same. Which one did you have in mind?
If M is considerably smaller than N, the classic algorithm for selecting M numbers out of [0, N) range is Bob Floyd algorithm that can be found in Bentley's "Programming Peals" book. It looks as follows (a sketch)
for (int j = N - M; i < N; ++j) {
int rand = random(0, j); // generate a random integer in range [0, j]
if (`rand` has not been generated before)
output rand;
else
output j;
}
In order to implement the check of whether rand
has already been generated or not for relatively high M some kind of implementation of a set is necessary, but in your case M=2 it is straightforward and easy.
Note that this algorithm distributes the sets of M numbers uniformly. Also, this algorithm requires exactly M iterations (attempts) to generate M random numbers, i.e. it doesn't follow that flawed "trial-and-error" approach often used in various ad-hoc algorithms intended to solve the same problem.
Adapting the above to your specific situation, the correct algorithm will look as follows
first = random(0, N - 2);
second = random(0, N - 1);
if (second == first)
second = N - 1;
(I leave out the internal details of random(a, b)
as an implementation detail).
It might not be immediately obvious why the above works correctly and produces a truly uniform distribution, but it really does :)