views:

266

answers:

3

I have read that it has something to do with time, also you get from including time.h, so I assumed that much, but how does it work exactly? Also, does it have any tendencies towards odd or even numbers or something like that? And finally is there something with better distribution in the C standard library or the Foundation framework?

+8  A: 

rand returns numbers generated by a pseudo-random number generator (PRNG). The sequence of numbers it returns is deterministic, based on the value with which the PRNG was initialized (by calling srand).

The numbers should be distributed such that they appear somewhat random, so, for example, odd and even numbers should be returned at roughly the same frequency. The actual implementation of the random number generator is left unspecified, so the actual behavior is specific to the implementation.

The important thing to remember is that rand does not return random numbers; it returns pseudo-random numbers, and the values it returns are determined by the seed value and the number of times rand has been called. This behavior is fine for many use cases, but is not appropriate for others (for example, rand would not be appropriate for use in many cryptographic applications).

James McNellis
How do the numbers change related to how many times it has been called?
Regan
Also, what is the difference between it and arc4random()?
Regan
@Regan: It depends on the PRNG being used, and I don't know what `arc4random` is.
James McNellis
"How do the numbers change"? They don't change. They're pseudo-random numbers. From a given seed, they produce the same random-seeming sequence. What are you asking?
S.Lott
James said that the values returned are dependent on the seed value and the number of times rand has been called and I was wondering how those things affect the values returned.
Regan
Each call to `rand()` returns a freshly computed pseudo-random value, and updates its internal state (aka the seed). Calling `srand()` allows you to set the seed so that you can repeat the same sequence of values from `rand()`, which is occasionally valuable when debugging. James is saying that the sequence of values returned by successive calls to `rand()` is deterministic, and depends on the seed set by `srand()`.
RBerteig
@Regan: Please, please, please, find a reference on random numbers. Your question makes very little sense. The numbers don't change -- they always pseudo random. No matter how many times you generate them or what seed you use, they're always "random" appearing numbers. No matter how many you generate, they pass all kinds of statistical tests. What are you asking?
S.Lott
A: 

How does rand() work?

http://en.wikipedia.org/wiki/Pseudorandom_number_generator

I have read that it has something to do with time, also you get from including time.h

rand() has nothing at all to do with the time. However, it's very common to use time() to obtain the "seed" for the PRNG so that you get different "random" numbers each time your program is run.

Also, does it have any tendencies towards odd or even numbers or something like that?

Depends on the exact method used. There's one popular implementation of rand() that alternates between odd and even numbers. So avoid writing code like rand() % 2 that depends on the lowest bit being random.

dan04
is that the `seed+prime*steps` implementation you're talking about?
sreservoir
+3  A: 

Briefly:

  • You use time.h to get a seed, which is an initial random number. C then does a bunch of operations on this number to get the next random number, then operations on that one to get the next, then... you get the picture.

  • rand() is able to touch on every possible integer. It will not prefer even or odd numbers regardless of the input seed, happily. Still, it has limits - it repeats itself relatively quickly, and in almost every implementation only gives numbers up to 32767.

  • C does not have another built-in random number generator. If you need a real tough one, there are many packages available online, but the Mersenne Twister algorithm is probably the most popular pick.

Now, if you are interested on the reasons why the above is true, here are the gory details on how rand() works:

rand() is what's called a "linear congruential generator." This means that it employs an equation of the form:

xn+1 = (a*xn + b) mod m

where xn is the nth random number, and a and b are some predetermined integers. The arithmetic is performed modulo m, with m usually 232 depending on the machine, so that only the lowest 32 bits are kept in the calculation of xn+1.

In English, then, the idea is this: To get the next random number, multiply the last random number by something, add a number to it, and then take the last few digits.

A few limitations are quickly apparent:

  • First, you need a starting random number. This is the "seed" of your random number generator, and this is where you've heard of time.h being used. Since we want a really random number, it is common practice to ask the system what time it is (in integer form) and use this as the first "random number." Also, this explains why using the same seed twice will always give exactly the same sequence of random numbers. This sounds bad, but is actually useful, since debugging is a lot easier when you control the inputs to your program

  • Second, a and b have to be chosen very, very carefully or you'll get some disastrous results. Fortunately, the equation for a linear congruential generator is simple enough that the math has been worked out in some detail. It turns out that choosing an a which satisfies *a*mod8 = 5 together with b = 1 will insure that all m integers are equally likely, independent of choice of seed. You also want a value of a that is really big, so that every time you multiply it by xn you trigger a the modulo and chop off a lot of digits, or else many numbers in a row will just be multiples of each other. As a result, two common values of a (for example) are 1566083941 and 1812433253 according to Knuth. The GNU C library happens to use a=1103515245 and b=12345. A list of values for lots of implementations is available at the wikipedia page for LCGs.

  • Third, the linear congruential generator will actually repeat itself because of that modulo. This gets to be some pretty heady math, but the result of it all is happily very simple: The sequence will repeat itself after m numbers of have been generated. In most cases, this means that your random number generator will repeat every 232 cycles. That sounds like a lot, but it really isn't for many applications. If you are doing serious numerical work with Monte Carlo simulations, this number is hopelessly inadequate.

  • A fourth much less obvious problem is that the numbers are actually not really random. They have a funny sort of correlation. If you take three consecutive integers, (x, y, z), from an LCG with some value of a and m, those three points will always fall on the lattice of points generated by all linear combinations of the three points (1, a, a2), (0, m, 0), (0, 0, m). This is known as Marsaglia's Theorem, and if you don't understand it, that's okay. All it means is this: Triplets of random numbers from an LCG will show correlations at some deep, deep level. Usually it's too deep for you or I to notice, but its there. It's possible to even reconstruct the first number in a "random" sequence of three numbers if you are given the second and third! This is not good for cryptography at all.

The good part is that LCGs like rand() are very, very low footprint. It typically requires only 32 bits to retain state, which is really nice. It's also very fast, requiring very few operations. These make it good for noncritical embedded systems, video games, casual applications, stuff like that.

PRNGs are a fascinating topic. Wikipedia is always a good place to go if you are hungry to learn more on the history or the various implementations that are around today.

Spencer Nelson
@Spencer: For C, what you describe is just one possible implementation of `rand`. Different C implementations have different PRNGs, and some older ones can be really bad even for non-cryptographic purposes. I don't know about Objective C, maybe it does mandate a particular RNG.
Gilles
@Giles: Really? They all use LCGs, but use use different coefficients according to *Computational Physics* by Nicholas J. Giordano and Hisao Nakanishi (2nd ed). Early versions of `rand()` picked very poor coefficients. RANDU is probably the really bad implementation you are thinking of. It came out before LCGs were well understood and took ***a***=65539 and ***b***=0 with famously disastrous results.There weren't really other small PRNGs available when C was developed though, as far as I know. What other implementations do you think were used?
Spencer Nelson
@Spencer Nelson: GCC doesn't "use" any implementation of `rand()`. GCC is a compiler, `rand( )` is provided by the host system's libc. The gnu libc may use a specific implementation across platforms.
Stephen Canon
Whoops, good catch Stephen. Corrected.
Spencer Nelson