views:

488

answers:

7

Is this how PHP implemented random number generating?

Say I want to calculate a yes or a no. Everytime I have a certain probability percentage (say: 0,05% for this example).

I do:

$possibilities = 100 / $probabilityPercentage; //$possibilities = 2000
$yes = rand(1,$possibilities);

$yesCheck = $possiblities;            //OPTION 1
$yesCheck = rand(1,$possibilities);   //OPTION 2


($yesCheck == $yes) ? return true : return false;

Does it give the same results with either option?

+1  A: 

The way most random generators work, the output is not truly random but rather based on an algorithm that should make the returned values appear random and distributed. Based on that I think the real probability that you actually get the same number twice in a row is even less than in the real world if you had two totally unrelated "random generators".

Edit: Having said that I have no inside info whatsoever about how the default random generator in php works.

Fredrik
+1 for pointing out the non-randomness of most PRNGs.
paxdiablo
A: 

Option 1 is guaranteed to be correct.

I don't think I learned enough probability and statistics back in the day to say whether Option 2 is correct.

I did, however, learn that one should NEVER trust someone else's random number generator without running test programs on it, to get an idea of how random it REALLY is.

In your case, I'd run a few million test cases through both options, and see whether Option 2 came up with similar statistics to Option 1.

John R. Strohm
In *theory* they are the same.
Andrew Medico
A: 

In theory, yes, the two expressions have exactly the same probablity of being true. That's assuming PHP's random number generator is in fact random - if it isn't, one will be more likely than the other.

The best approach would be to run an experiment (thousands of iterations) and see what happens.

Andrew Medico
That is certainly *not* the best approach. Experimental evidence can only give, at best, an indication, even with *millions* of iterations. The best approach is a deterministic one where you examine the algorithm.
paxdiablo
+3  A: 

IF the random number generator is truly random then both approaches produce the same result. However, computer random number generators aren't perfect. I doubt the flaws are enough to matter but the only way to know for sure is to try it--run a test for as long as possible and see if there is a deviation from what it should be. You'll need millions of random numbers at a minimum.

Loren Pechtel
+1 for taking into account the non-randomness of most computer-based "random" number generators.
paxdiablo
+10  A: 

Let the data speak for itself.

Code

vinko@parrot:~$ more rand.php
<?php

$randrandsum = 0;
$randconstsum = 0;
$count = 20;
for ($j = 0; $j < $count; $j++) {
        $randrand = 0;
        $randconst = 0;
        for ($i = 0; $i < 10000000; $i++ ){
                $a = rand(1,1000);
                $b = rand(1,1000);
                if ($a == $b) $randrand++;
        }
        for ($i = 0; $i < 10000000; $i++ ){
                $a = rand(1,1000);
                $c = 1000;
                if ($c == $a) $randconst++;
        }
        $randrandsum += $randrand;
        $randconstsum += $randconst;
        print ($j+1)." RAND-RAND: $randrand RAND-CONST: $randconst\n";
}
print "AVG RAND-RAND: ".($randrandsum/$count);
print " AVG RAND-CONST: ".($randconstsum/$count)."\n";
?>

Test run

vinko@parrot:~$ php rand.php
1 RAND-RAND: 10043 RAND-CONST: 10018
2 RAND-RAND: 9940 RAND-CONST: 10132
3 RAND-RAND: 9879 RAND-CONST: 10042
4 RAND-RAND: 9878 RAND-CONST: 9965
5 RAND-RAND: 10226 RAND-CONST: 9867
6 RAND-RAND: 9866 RAND-CONST: 9992
7 RAND-RAND: 10069 RAND-CONST: 9953
8 RAND-RAND: 9967 RAND-CONST: 9862
9 RAND-RAND: 10009 RAND-CONST: 10060
10 RAND-RAND: 9809 RAND-CONST: 9985
11 RAND-RAND: 9939 RAND-CONST: 10057
12 RAND-RAND: 9945 RAND-CONST: 10013
13 RAND-RAND: 10090 RAND-CONST: 9936
14 RAND-RAND: 10000 RAND-CONST: 9867
15 RAND-RAND: 10055 RAND-CONST: 10088
16 RAND-RAND: 10129 RAND-CONST: 9875
17 RAND-RAND: 9846 RAND-CONST: 10056
18 RAND-RAND: 9961 RAND-CONST: 9930
19 RAND-RAND: 10063 RAND-CONST: 10001
20 RAND-RAND: 10047 RAND-CONST: 10037
AVG RAND-RAND: 9988.05 AVG RAND-CONST: 9986.8

Given the above results I'd say that for all practical purposes, both options are equivalent, giving the expected 1/1000 result on both cases.

Vinko Vrsalovic
+1, although I have to answer your "Let the data speak for itself" with http://www.dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/2000/300/2318/2318.strip.gif
balpha
Well, in this case both results are the expected ~1/1000 so there's no smell
Vinko Vrsalovic
This test methodology really is not correct, because it always calls rand() twice in each iteration. A more accurate test would run one loop testing rand(1,1000)==1000 and a second loop testing rand(1,1000)==rand(1,1000).
Andrew Medico
Although, as you say, the fact that both end up with the expected 1/1000 allieviates that concern.
Andrew Medico
Okay, I'll update with a better test!
Vinko Vrsalovic
Great test! I think this answers my question. I am quite curious about the better test though ;)
Ropstah
@ropstah: If you check the edit history you'll see that what's currently there already is the better test :)
Vinko Vrsalovic
You should not rely on experimental evidence - at best it's an indication only. A better approach would be analysis of the random algorithm to see if the premise is correct.
paxdiablo
Technically, you are correct, practically, you're exaggerating. Mathematically, as implied in the last line, the two expressions are equivalent and, in practice over many iterations, they seem to be. So, while being an indication and not proof, it is indeed a very strong indication. I'd argue that this indication is really strong enough as to make no difference for the calculation of a binary value. To get a real proof you would have to analyze the RNG of the language of choice, which probably is more work than it deserves. Although I'd very much like to see that, care to elaborate? :-)
Vinko Vrsalovic
@Vinko, I didn't mean to slight your method (+1 by way of apology). It indicates (quite strongly as you point out) that the PHP RNG is a decent one. Many linear congruential ones will almost guarantee not getting the same number twice in a row over the entire range (though the modulo 1000 may allow it to happen more often). I was just stating I prefer hard evidence or reasoning to empirical evidence. And I agree about the workload of proving it. Since most (non-mathematician) people don't really care how random their random numbers are, it's not worth it.
paxdiablo
+8  A: 

Yes, rand(1,1000) = 1000 is just as probable as rand(1,1000) = rand(1,1000).

Imagine rolling two dices. After you rolled the first one, what is the probability the second one will equal the first when rolled? 1/6.

Now write down a number between 1 and 6 and roll a dice. What is the probability the dice will equal what you have just written? 1/6.

Snut
Good explanation, but I couldn't make out for sure if the rolling a dice once and have it match a number would be more probable then rolling it twice (or roll two dice) and make it show the same number...
Ropstah
I have to disagree with this answer. While that first sentence is true for truly random events, random number generators on computers are not generally truly random. In fact, if you use a linear congruential generator, it's almost a certainty that two consecutive numbers would be the same (subject to modulo).
paxdiablo
+2  A: 

This does not directly address your question, but you may want to look at mt_rand() instead. The PHP documentation states:

Many random number generators of older libcs have dubious or unknown characteristics and are slow. By default, PHP uses the libc random number generator with the rand() function. The mt_rand() function is a drop-in replacement for this. It uses a random number generator with known characteristics using the » Mersenne Twister, which will produce random numbers four times faster than what the average libc rand() provides.

From http://www.php.net/manual/en/function.mt-rand.php.

Matt Nizol