views:

416

answers:

4

Dear all,

I have the following C++ code that tried to generate a random number. The idea is we given some rate "x" and number of runs; we hope it would generate the number as many as (x * number of runs times).

#include <iostream>      
#include <vector>
#include <fstream>
#include <sstream>       
#include <time.h>        
using namespace std;     


int main  () {

    // Initialize Random Seed 
    srand (time(NULL));

    string line;
    double SubsRate = 0.003;  
    double nofRuns   = 1000000;

    for (unsigned i=0; i < nofRuns ; i++) {

        int toSub = rand() % 1000 + 1;

        if (toSub == (SubsRate * 1000)) {
         cout << toSub << " Sub"  << endl; 
        }

    }
    return 0;
}

Hence if we run the code above K times with this command:

$ a=0 ; while test $a -lt 10 ; do ./MyCode | wc -l ; a=`expr $a + 1` ; done

We expect it to generate number "3" as many as ~3000 times in 1M runs. But some how my code above my code above only generate number "3" as many as 900 ~ 1000 times.

How can I improve on my code above?

+1  A: 

I think your math here is a little off...

According to the code you have up there, you will uniformly be generating random numbers between 1 and 1000.

Your check (toSub==(SubsRate*1000)) merely checks if the number you have generated is 3 (since rate*1000=3). Hence, you will only get 3 once about 1000 times, not 3000 times.

You didn't mention what the range for your numbers are, but generally speaking, if you want to generate a number in a range between IMIN and IMAX using a uniform distribution (each value has the same chance of appearing), then you simply write:

int I = IMin + rand() % (IMax - IMin);

In this case, if you wanted each number to appear once every 3000 times, you would have to randomize a number between 1 and 3000. Otherwise, you are not talking about a uniform distribution.

Uri
As noted in the answer to another question http://stackoverflow.com/questions/614012/howto-restart-loop-in-c-finding-unique-sequence-over-random-runs/614029#614029 it is not very good to use the above formula to get uniform distribution.
Paul
+3  A: 

In other words, you are checking that the result == 3, not that the result is <= 3.

3 will only happen, one in 1000 times, but <= 3 will happen at the rate you want.

Daniel Von Fange
<= 3 you mean?
neversaint
You are right. I did not take into effect his +1.
Daniel Von Fange
+2  A: 
  • You will expect to get number 3 one time out of 1000, i.e. 1000 times out of 1M.
  • You will expect to get number 9 one time out of 1000, i.e. 1000 times out of 1M.
  • You will expect to get number 7 one time out of 1000, i.e. 1000 times out of 1M.
  • You will expect to get either one of 3, 7 or 9 three times out of 1000, i.e. 3000 times out of 1M.

Cheers, V.

vladr
+1  A: 

As others have mentioned, your original was testing whether the random number is equal to the fraction of the distribution you wanted, not below that.

rand() generates a value between 0 and RAND_MAX (inclusive). RAND_MAX may be quite small - a typical value is 32767. If you use modulo 1000, then there are 32 values which rand() returns which map to each value from 768 to 999, and 33 values which map to values 0 to 767. So that's a little skewed.

You seem to have pulled 1000 out the air. If instead you scale RAND_MAX itself by the proportion of the distribution you want, then you don't get the skewing effect, nor do you have to process the output of rand() to make the comparison:

int main  () {
    srand (time(NULL));

    double subsRate = 0.003;  

    unsigned int nofRuns   = 1000000;

    int cutoff = (int) ( subsRate * ( (long) RAND_MAX + 1L ) );

    for (unsigned int i = 0; i < nofRuns ; i++) 
        if ( rand() < cutoff ) 
            cout << " Sub "  << endl; 

    return 0;
}
Pete Kirkham