ansaurus

Question

Why do digits 1, 2 and 3 appear so frequently using C rand() function?

Answer 1

+4 A:

Looks like Benford's Law - see http://en.wikipedia.org/wiki/Benford%27s_law, or alternatively a not very good RNG.

anon 2010-08-01 09:30:12

Benfords law was my first thought as well, but doesnt it hold just for "real-life" data, i.e. empirically retrieved data?

phimuemue 2010-08-01 09:35:47

1.23% of statistics will not comply with Benford's law, except for on 3/12/2013. Sorry - couldn't resist. My belief is that this is indeed just for real-life data.

Will A 2010-08-01 09:40:34

Benford's Law explains the same observation but not under the given circumstances. I assume a pseudo-random uniform distribution. Benford's law applys to distributions which have uniform logarithms.

Peter G. 2010-08-01 09:40:50

1111222334 Nice link

Matt Joiner 2010-08-02 06:19:54

Answer 2

+35 A:

rand() generates a value from 0 to RAND_MAX. RAND_MAX is set to INT_MAX on most platforms, which may be 32767 or 2147483647.

For your example given above, it appears that RAND_MAX is 32767. This will place an unusually high frequency of 1, 2 and 3 for the most significant digit for the values from 10000 to 32767. You can observe that to a lesser degree, values up to 6 and 7 will also be slightly favored.

Matt Joiner 2010-08-01 09:30:49

Beat me to it - good call.

Will A 2010-08-01 09:34:31

Why should 6 and 7 be slightly favored?

Hippo 2010-08-01 09:35:30

'cause for any number > 32700, the fourth digit can be as high as 6. For any number > 32760, the fourth digit can be as high as 7.

Will A 2010-08-01 09:42:13

Much more important that the bias for six and seven is the bias against zero. 00012 is pretty-printed "12" but 11112 is pretty-printed "11112". All leading zeroes that would make statistics balanced if the range was a power of ten are omitted by `printf`.

Pascal Cuoq 2010-08-01 09:54:07

Thanks Will and Pascal, very good observations/points.

Matt Joiner 2010-08-01 11:33:45

Answer 3

A:

rand() implementations vary wildly. IIRC, linear congruential random generators tend to have less randomness in lower-order bits. Quoting from the linux rand() man page:

The versions of rand() and srand() in the Linux C Library use the same random number generator as random(3) and srandom(3), so the lower-order bits should be as random as the higher-order bits. However, on older rand() implementations, and on current implementations on different systems, the lower-order bits are much less random than the higher-order bits. Do not use this function in applications intended to be portable when good randomness is needed.

ninjalj 2010-08-01 09:31:10

Answer 4

+1 A:

That's because you generate numbers between 0 and RAND_MAX. The generated numbers are evenly distributed (i.e. approx. same probability for each number), however, the digits 1,2,3 occur more often than others in this range. Try generating between 0 and 10, where each digit occurs with the same probability and you'll get a nice distribution.

phimuemue 2010-08-01 09:34:06

Answer 5

+15 A:

KennyTM 2010-08-01 10:26:08

Answer 6

A:

When you want to generate random value from range [0, x), instead of doing rand()%x, you should apply formula x*((double)rand()/RAND_MAX), which will give you nicely distributed random values.

Say, RAND_MAX is equal to 15, so rand will give you integers from 0 to 15. When you use modulo operator to get random numbers from [0, 10), values [0,5] will have higher frequency than [6,9], because 3 == 3%10 == 13%10.

el.pescado 2010-08-01 10:42:05

Answer 7

+1 A:

If I understand what the OP (person asking the question) wants, they want to make better random numbers.

rand() and random(), quite frankly, don't make very good random numbers; they both do poorly when tested against diehard and dieharder (two packages for testing the quality of random numbers).

The Mersenne twister is a popular random number generator which is good for pretty much everything except crypto-strong random numbers; it passes all of the diehard(er) tests with flying colors.

If one needs crypto-strong random numbers (numbers that can not be guessed, even if someone knows which particular crypto-strong algorithm is being used), there are a number of stream ciphers out there. The one I like to use is called RadioGatún[32], and here’s a compact C representation of it:

/*Placed in the public domain by Sam Trenholme*/
#include <stdint.h>
#include <stdio.h> 
#define p uint32_t
#define f(a) for(c=0;c<a;c++)
#define n f(3){b[c*13]^=s[c];a[16+c]^=s[c];}k(a,b 
k(p *a,p *b){p A[19],x,y,r,q[3],c,i;f(3){q[c]=b[c
*13+12];}for(i=12;i;i--){f(3){b[c*13+i]=b[c*13+i- 
1];}}f(3){b[c*13]=q[c];}f(12){i=c+1+((c%3)*13);b[
i]^=a[c+1];}f(19){y=(c*7)%19;r=((c*c+c)/2)%32;x=a
[y]^(a[(y+1)%19]|(~a[(y+2)%19]));A[c]=(x>>r)|(x<<
(32-r));}f(19){a[c]=A[c]^A[(c+1)%19]^A[(c+4)%19];
}a[0]^=1;f(3){a[c+13]^=q[c];}}l(p *a,p *b,char *v
){p s[3],q,c,r,x,d=0;for(;;){f(3){s[c]=0;}for(r=0
;r<3;r++){for(q=0;q<4;q++){if(!(x=*v&255)){d=x=1;
}v++;s[r]|=x<<(q*8);if(d){n);return;}}}n);}}main(
int j,char **h){p a[39],b[39],c,e,g;if(j==2){f(39
){a[c]=b[c]=0;}l(a,b,h[1]);f(16){k(a,b);}f(4){k(a
,b);for(j=1;j<3;++j){g=a[j];for(e=4;e;e--){printf
("%02x",g&255);g>>=8;}}}printf("\n");}}

There are also a lot of other really good random number generators out there.

samiam 2010-08-01 11:41:50

ansaurus

tags:

views:

answers:

Why do digits 1, 2 and 3 appear so frequently using C rand() function?

related questions