tags:

views:

738

answers:

6

I would like to get 2 random different elements from an std::vector. How can I do this so that:

  • It is fast (it is done thousands of times in my algorithm)
  • It is elegant
  • The elements selection is really uniformly distributed
+1  A: 

Not elegant, but extreamly simple: just draw a random number in [0, vector.size()[ and check it's not twice the same.

Simplicity is also in some way elegance ;)

What do you call fast ? I guess this can be done thousands of times within a millisecond.

Tristram Gräbener
Ok, so you would use rand. The problem then is, that if I do rand()%vector.size() that the numbers are not uniformly distributed.
Peter Smit
If problem is randomization then just use good one:http://www.boost.org/doc/libs/1_42_0/libs/random/index.html
Alexander Poluektov
Peter never use `rand() % vector.size()` see http://linux.die.net/man/3/rand use something like `rand() * vector.size() / RAND_MAX` this is promised to be uniform.
Artyom
@Artyom Do you promis that this will never overflow? Btw, on the page you linked to, no promises about uniformity are made.
Peter Smit
rand() is an LCG, right? Might wanna read up on the disadvantages on the wikipedia page. The distribution seems to be uneven: http://en.wikipedia.org/wiki/Linear_congruential_generator#Advantages_and_disadvantages_of_LCGs
Lucas Lindström
@Peter : "http://linux.die.net/man/3/rand" `0<=rand()<=RAND_MAX` so if you write this correctly it does not overflow. Uniformity, if rand() is not enough uniform (and it is for most practical cases) I can only suggest using some cryptographic algorithms...
Artyom
The non-uniformness happens when `RAND_MAX % vector.size() != 0`. The first `RAND_MAX % vector.size()` elements have an `1/RAND_MAX` higher probability.
MSalters
The problem with your solution is that you don't define the algorithm to "check it's not twice the same." The default solution would be a while loop until they are not equal, but that is not guaranteed to ever execute -- it's possible that it will randomly, forever, pick the same two numbers. (Pedantically, there's the issue that the zero or one item list is not being handled here either.)
Conspicuous Compiler
+3  A: 

For elegance and simplicty:

void Choose (const int size, int &first, int &second)
{
  // pick a random element
  first = rand () * size / MAX_RAND;
  // pick a random element from what's left (there is one fewer to choose from)...
  second = rand () * (size - 1) / MAX_RAND;
  // ...and adjust second choice to take into account the first choice
  if (second >= first)
  {
     ++second;
  }
}

using first and second to index the vector.

For uniformness, this is very tricky since as size approaches RAND_MAX there will be a bias towards the lower values and if size exceeds RAND_MAX then there will be elements that are never chosen. One solution to overcome this is to use a binary search:

int GetRand (int size)
{
  int lower = 0, upper = size;
  do
  {
    int mid = (lower + upper) / 2;

    if (rand () > RAND_MAX / 2) // not a great test, perhaps use parity of rand ()?
    {
       lower = mid;
    }
    else
    {
       upper = mid;
    }
  } while (upper != lower); // this is just to show the idea,
                            // need to cope with lower == mid and lower != upper
                            // and all the other edge conditions

  return lower;
}
Skizz
Simple and nice approach. Maybe still another random generator, but I think my code will look like this. This is guarenteed to stop (with endless while loops you're never sure :) ) :)
Peter Smit
The first code snippet does not work: What happens if you get the very last element for `second`? You then increment the index and overflow. Remove the `(size - 1)` and then you can cache the `size / MAX_RAND`.
graham.reeds
@graham.reeds: From memory, I assumed rand () generated a value betwen 0 and (RAND_MAX - 1) inclusive, i.e. it would never return the value RAND_MAX. On checking the documentation, it appears to generate numbers in the inclusive range 0 to RAND_MAX. So just replace size with (size - 1) and (size - 1) with (size - 2). But the algorithm is sound, which is the important thing.
Skizz
I've trouble to understand how GetRand can be uniform, excepted when there is a power of 2 number of possible cases (I'm too lazy to check if size is a valid result or not).
AProgrammer
@AProgrammer: You're right, the code as given is probably only uniform for cases where size is a power of two. You can overcome this by changing the decision to take into account the number of numbers either side of the decision point: if (rand () * (left_size + right_size) / (left_size * RAND_MAX) < 1) use_left_side else use_right_side
Skizz
@Skizz, you'll still get quantifications error. I don't know of a way to get uniform without dropping the result of some calls to rand(). I'll give a way when I get home if I think of it and there is no applicable answer then.
AProgrammer
+3  A: 

How about using a std::queue and doing std::random_shuffle on them. Then just pop til your hearts content?

graham.reeds
This is O(N) time and O(N) space. Choosing 2 random elements can be done in O(1) time and O(1) space.
KennyTM
For all its shortness, this is one of few answers that (1) guarantees non-infinite run time and (2) doesn't use rand() incorrectly. The down side is that random_shuffle() may not be fast, as requested by the asker.
Conspicuous Compiler
But will be the single shuffle be slower than all the rand()'s added together?
graham.reeds
@graham.reeds Sometimes it will be and sometimes it won't be, since the number of iterations is unknown, hence why many of the other answers here don't meet the question asker's criteria.
Conspicuous Compiler
A: 

Whenever need something random, you are going to have various questions about the random number properties regarding uniformity, distribution and so on.

Assuming you've found a suitable source of randomness for your application, then the simplest way to generate pairs of uncorrelated entries is just to pick two random indexes and test them to ensure they aren't equal.

Given a vector of N+1 entries, another option is to generate an index i in the range 0..N. element[i] is choice one. Swap elements i and N. Generate an index j in the range 0..(N-1). element[j] is your second choice. This slowly shuffles your vector which may be problematical, but it can be avoided by using a second vector which holds indexes into the first, and shuffling that. This method trades a swap for the index comparison and tends to be more efficient for small vectors (a dozen or fewer elements, typically) as it avoids having to do multiple comparisons as the number of collisions increase.

swestrup
A: 

You might wanna look into the gnu scientific library. There are some pretty nice random number generators in there that are guaranteed to be random down to the bit level.

Flamewires
+4  A: 

What you need is to generate M uniformly distributed random numbers from [0, N) range, but there is one caveat here.

One needs to note that your statement of the problem is ambiguous. What is meant by the uniformly distributed selection? One thing is to say that each index has to be selected with equal probability (of M/N, of course). Another thing is to say that each two-index combination has to be selected with equal probability. These two are not the same. Which one did you have in mind?

If M is considerably smaller than N, the classic algorithm for selecting M numbers out of [0, N) range is Bob Floyd algorithm that can be found in Bentley's "Programming Peals" book. It looks as follows (a sketch)

for (int j = N - M; i < N; ++j) {

  int rand = random(0, j); // generate a random integer in range [0, j]

  if (`rand` has not been generated before)
    output rand;
  else
    output j;
}

In order to implement the check of whether rand has already been generated or not for relatively high M some kind of implementation of a set is necessary, but in your case M=2 it is straightforward and easy.

Note that this algorithm distributes the sets of M numbers uniformly. Also, this algorithm requires exactly M iterations (attempts) to generate M random numbers, i.e. it doesn't follow that flawed "trial-and-error" approach often used in various ad-hoc algorithms intended to solve the same problem.

Adapting the above to your specific situation, the correct algorithm will look as follows

first = random(0, N - 2);  
second = random(0, N - 1);
if (second == first)
  second = N - 1;

(I leave out the internal details of random(a, b) as an implementation detail).

It might not be immediately obvious why the above works correctly and produces a truly uniform distribution, but it really does :)

AndreyT
This truly is beautiful code. Thanks
ufotds