views:

48

answers:

2

I am putting together an internal "every developer should know" wiki page.

I saw many discussions regarding rand() % N, but not a single web page that explains it all.

For instance, I am curious if this problem is only C- and Linux-specific, or if it also applies to Windows, C++,. Java, .Net, Python, Perl.

Please help me get to the bottom of this. Also, just how non-random do the numbers get? Thank you!

+2  A: 

Check out http://en.wikipedia.org/wiki/Linear_congruential_generator, which is likely the algorithm used for most built-in random number generators.

Scrolling down, look for the paragraph beginning with "A further problem of LCGs is that the lower-order bits of the generated sequence have a far shorter period.." for some insight into rand() % N.

brainjam
+1  A: 

I don't have a web page to refer you to but I might have a "back of the envelope" explanation that would help. The way simple random number generators work is by following the steps

  1. Use the last number generated n or a seed number.
  2. Multiply that number by a special large number
  3. Add another special large number
  4. Divide that by a third special large number and throw away the remainder
  5. Return the result

Now if you think about what happens in all but step 4 you are doing operations where only the lower bits can alter the lower bits of the result. Adding 1001 and 100...00001 will end in ...02 (Ha you though I was talking base 2, really these number are base 12 for giggles.) regardless of what is on the high end of the calculation. Similarly, when you multiply it will end in a 1, no matter what.

There is a similar problem at the top end as well, a billion times a billion will invariably dominate the contribution of the hundreds places of wither number. This points to the fact that the middle is where the good stuff happens. Lots of bits interact here--high, middle, and low.

That is the purpose of the division step, it cuts off the bottom chunk of the result where there was not as much interaction. The top chunk is not usually chopped off because the computer drops the upper bits when the multiplications do not fit into a machine word any more.

In the end though the cut off points are somewhat arbitrary and you can be more picky than the people who designed the algorithm and still chop off a few more bits.

For you question of how bad they can be, they can be really bad. The easiest way to see this is to group individual numbers into tuples and graph them. So if you had random numbers a, b, c, d, ... graph (a,b), (c,d), ... and look at the results. This is called a Spectral Test and Rand fails it beautifully. This one I have a link for try http://random.mat.sbg.ac.at/results/karl/spectraltest/

Ukko