views:

220

answers:

6

Given a series of randomly generated data how can I figure out how random it actually is? Is R-lang a good tool for this matlab? What other questions can can these tools answer about randomly generated data? Is there another tool better for this?

+1  A: 

There's as always a toolbox for it.

zellus
+9  A: 

The DieHarder test battery by Robert G. Brown --- which reimplements and extends the old DIEHARD by Marsaglia et al -- has been wrapped into the R package RDieHarder which you could start with.

Note that RDieHarder versions need their particular matching DieHarder releases -- and we're not there yet for the most recent development version of the latter.

Edit Also, for the subset of cryptographioic tests, the NIST suite (which is included in DieHarder) should be appropriate as that is what it was designed for.

Dirk Eddelbuettel
+3  A: 

I recommend reading Chapter 10 of Beautiful Testing: Testing a Random Number Generator. It's a little more approachable than most texts on the topic. Maybe, if we're nice, the author of that chapter, John Cook, might stop by and give his input.

JD Long
+3  A: 

According to Wikipedia (Randomness):

The central idea is that a string of bits is random if and only if it is shorter than any computer program that can produce that string (Kolmogorov randomness) — this means that random strings are those that cannot be compressed.

Therefore, given the random stream of numbers, save it to a file, and compress it using your favorite tool (zip, rar, ...). The compression ratio can be interpreted as measure of randomness... Even better, I would use it as a relative score to compare the randomness of two data series.

Amro
+1  A: 

For theory, the above mentioned reference by Knuth is useful and to link Amro's response, there is work by Li & Vitanyi which relates here. link text

Vishal Belsare
Cilibrasi did some work related to implementing ideas on these lines, however the code isn't a straightforward R / MATLAB package/toolbox.
Vishal Belsare
+4  A: 

First you need to decide what kind of randomness you're testing for. Do you have in mind a uniform distribution inside some range? That's usually what people have in mind, though you may have some other flavor of randomness such as a normal distribution.

Once you have a candidate distribution, you can test the goodness of fit to that distribution. The Kolmogorov-Smirnov test is a good general-purpose test. I believe it's called ks.test in R. But I also believe it assumes distinct values, so that could be a problem if you're sampling from such a small range of values that the same value appears more than once.

S. Lott mentioned Knuth's Seminumerical Algorithms in the comments. That book has a good introduction to the chi-squared test and the Kolmogorov-Smirnov tests for goodness of fit.

If you do suspect you have uniform random values, the DIEHARD test that Dirk Eddelbuettel mentioned is a standard test.

John D. Cook
Well I mentioned *DieHarder* which is different from DIEHARD which it reimplements and extends with a bunch of tests not found in DIEHARD, including some from NIST and some Robert and collaborators worked on.
Dirk Eddelbuettel
Sorry, I missed that. I like the Bruce Willis movie allusion. :)
John D. Cook
Yup. Robert is a fun guy.
Dirk Eddelbuettel
It's actually `ks.test`.
nico
@nico: fixed it.
Richie Cotton

related questions