tags:

views:

436

answers:

6

Every language has a random() function or something similar to generate a pseudo-random number. I am wondering what happens underneath to generate these numbers? I am not programming anything that makes this knowledge necessary, just trying to satisfy my own curiosity.

Thanks.

+7  A: 

The entire first chapter of Donald Knuth's seminal work Seminumerical Algorithms is taken up with the subject of random number generation. I really don't think an SO answer is going to come close to describing the issues involved. Read the book.

anon
Has he updated the book within the last few decades? It's likely not the best source anymore.
David Thornley
And your alternative is ... what?
anon
I don't think much has changed in the fundamentals of random numbers in the past decades, at least not in the mainstream. One could always look into theoretical research papers on random generation, but I'd recommend Knuth as a springboard anyway.
Kena
Knuth is comprehensive on this subject but also not particularly accessible.
Dan Dyer
@Kena: The underling math may not have changed, but there have been several new implementation that provide much better properties for simulation. MT was a big step forward, and *it* has been surpassed. And the state of the cryptographic art has move forward enormously.
dmckee
I never looked into Knuth's book directly, but it's still cited as a source for many publications about random numbers. Names to look for nowadays are Pierre L'Ecuyer (author of SSJ and co-developer of the WELL generators) and Makoto Matsumoto (developer of the Mersenne Twister and co-developer of the WELL generators) at least. George Marsaglia is also a pretty big name, though not so much in recent times, as far as I have noticed.
Joey
+4  A: 

The Wikipedia page is a good reference.

The actual algorithm used is going to be dependent on the language and the implementation of the language.

Chi
A: 

To exactly answer you answer, the random function is provided by the operation system (usually).

But how the operating system creates this random numbers is a specialized area in computer science. See for example the wiki page posted in the answers above.

Henri
No, the random fonction is always provided by the language system. It may use the operating system to get some entropy, but the the C rand() function (typically) does not.
anon
I have to agree with Neil, many OSs provide random and pseudo random number APIs, but many standard libraries proved *separate* implementation. In anycase, until you've read the docs, you must assume that they are all crappy linear congruent jobs not safe for science, secrecy, or anything involving money. The docs will confirm this about some, then you check out any claims made by the others.
dmckee
+3  A: 

random() is a so called pseudorandom number generator (PRNG). random() is mostly implemented as a Linear congruential generator. This is a function of the form X(n+1) (aXn +c) modulo m. Xn is the sequence of generated pseudorandom numbers. The genarated sequence of numbers is easy guessable. This algorithm can't be used as a cryptographically safe PRNG.

Wikipedia:Linear congruential generator

And take a look at the diehard tests for PRNG PRNG Diehard Tests

Thomas Maierhofer
You forgot to mention that the random numbers produced by an LCG are somewhere between mediocre and awful.
starblue
Yes you're right but for some quick and dirty random number generation it is ok. Do you know the DIEHARD statistical tests on PRNG? Most PRNG fail in some manner in this tests.
Thomas Maierhofer
TestU01 by L'Ecuyer is also a nice testing framework nowadays. Only downside is that it's designed to find flaws in nearly all generators :-)
Joey
+4  A: 

It turns out to be surprisingly easy to get half-way-decent pseudorandom numbers. For decades the gold standard was a remarkably simple algorithm: keep state x, multiply by constant A (32x32 => 64 bits) then add constant B, then return the low 32-bits, which also become the new x. If A and B are chosen carefully this actually works fairly well.

Pseudorandom numbers need to be repeatable, too, in order to reproduce behavior during debugging. So, seeding the generator (initializing x with, say, the time-of-day) is typically avoided during debugging.

In recent years, and with more compute cycles available to burn, more sophisticated algorithms are available, some of them invented since the publication of the otherwise quite authoritive Seminumerical Algorithms. Operating systems are also starting to provide hardware and network-derived entropy bits for specialized cryptographic purposes.

DigitalRoss
(a) an LCG does *not* produce halfway decent pseudorandom numbers, actually the results are pretty horrible to disastrous, depending on how you choose factor, addend and modulus. (b) Better generators does not always mean that they are slower. For example, MT19937 is an excellent generator (though superseded now) which runs faster than most LCG implementations. (c) While it may help in debugging when a PRNG produces repeatable output, it is absolutely mandatory in simulations.
Joey
Please reread the question. He wasn't asking for the best algorithm, the question was: how do they work?Yes, the Mersenne twister is a better algorithm. It's what Ruby uses, and if the question was "what's best" we might have discussed it. It's interesting to note that some MT implementations use an LCRNG for initialization.
DigitalRoss
Sure, the question asks for how it's done but you are telling that it's easy to gett decent random numbers with an LCG, which is just wrong. As for MT's initialization, if done correctly, this suffices. What matters is (a) that if you use multiple RNGs that your *seeds* don't have linear dependencies and (b) your generators doesn't introduce one. MT19937 has been conceived in a way that the recommended initialization method doesn't affect the quality of the output.
Joey
No, I didn't say that. I said "half-way-decent". That's not the same at all. Remember, the Mersenne twister was not even *invented* until 1997 and took some time to catch on. You have a good chance of encountering something based on the LCRNG. If I was trying to write an essay I might have covered the various improvements on the LCRNG that made it rule for so long, but it's just a SO question. He needs to go to wikipedia. (It has been a long time since I read SNA but IIRC he gave a good review to the LCRNG when `A` and `B` were carefully chosen. Have you read Knuth more recently?)
DigitalRoss
It would be interesting to know which libraries and languages used which algorithms...
DigitalRoss
A: 

One thing you might want to examine is the family of random devices available on some Unix-like OSes like Linux and Mac OSX. For example, on Linux, the kernel gathers entropy from a variety of sources into a pool which it then uses to seed it's pseudo-random number generator. The entropy can come from a variety of sources, the most notable being device driver jitter from keypresses, network events, hard disk activity and (most of all) mouse movements. Aside from this, there are other techniques to gather entropy, some of them even implemented totally in hardware. There are two character devices you can get random bytes from and on Linux, they behave in the following way:

  • /dev/urandom gives you a constant stream of bytes which is very random but not cryptographically safe because it reuses whatever entropy is available in the pool.
  • /dev/random gives you cryptographically safe random numbers but it won't give you a constant stream as it uses the entropy available in the pool and then blocks while more entropy is collected.

Note that while Mac OSX uses a different method for it's PRNG and therefore does not block, my personal benchmarks (done in college) have shown it to be every-so-slightly less random than the Linux kernel. Certainly good enough, though.

So, in my projects, when I need randomness, I typically go for reading from one of the random devices, at least for the seed for an algorithm in my program.

yonkeltron
`/dev/urandom` is still a cryptographically secure PRNG. Those are designed so the generator's state can't be guessed easily (in contrast to true RNGs which exist because their state can't be guessed). In any event, even for the vast majority of applications needing security, `/dev/urandom` is a sane choice. You will definitely have fun when trying to use `/dev/random` (and then wait for your application to finish), especially with multiple processes or users on the same system :-)
Joey