views:

1625

answers:

12

Hi all,

I have a pseudo random number generator (PRNG) class that I want to unit test. There are two approaches:

  1. write a test case that takes a large amount of samples and test whether they are properly distributed. This approach may lead to a fairly long execution time for the test case;
  2. calculate a small series of samples 'by hand' and verify if the PRNG algorithm reproduces it. This approach may lead to a not random sequence being generated without being noticed;

I would say that the first approach is not really unit testing because it does not perform a white box test of the generator, but on the other hand it properly tests the responsibility of the class. The second approach is more like a real unit test, focusing on the algorithm, but it does not provide as much evidence as to whether the class fulfills its responsibility.

Which approach do you prefer, and why?


+7  A: 

For testing a PRNG, I would use ENT which is a suite of statistical tests that will tell you how well your PRNG performs. I suppose this is approach 1.

Greg Hewgill
+4  A: 

I would imagine that ultimately you would want both tests - because you want to be sure that both the following hold true:

(1) the numbers are properly distributed and (2) the specific algorithm is working as expected.

Perhaps the first test could only be run occasionally, whereas the second could be use to check that any code changes hadn't broken the algorithm.

Mark Pattison
+22  A: 

Surely you haven't invented your own PRNG algorithm?

So get another implementation of the same algorithm, generate a smallish number of longish test cases based on known seeds, and verify that your implementation of the algorithm matches everyone else's implementations. The more data you test, the more chance it does. If you want to be serious, look into how FIPS validation is done for the algorithm.

There's no need to test whether the output is random, since far more research has been done on the algorithm by others than you are capable of reproducing.

If I had invented my own PRNG, without already knowing how to test it, then my preferred approach would be to scrap it and use one which works.

Steve Jessop
This is a really good point. Unit testing tests your implementation - in this case that you have implemented the algorithm correctly. It cannot tell you that you did the right thing - only that you did it right.
Draemon
"If I had invented my own PRNG, without already knowing how to test it, then my preferred approach would be to scrap it and use one which works." Amen!
Brad Wilson
However, you didn't answer his question of how to test it if you HAVE written your own PRNG. You just wrote off the question and answered something else.
The question mentions a PRNG class which requires testing. I answer by saying that it should be tested by comparison with other implementations of the same algorithm, or using FIPS validation procedures for the algorithm. If you need to know about PRNG algorithm design, just ask another question.
Steve Jessop
+3  A: 

I believe your first point (#1) gets more at testing the quality of the random numbers generated, which is dictated by the algorithm being used. The second point (#2) gets more at testing the implementation of the algorithm. If you designed the algorithm, both tests are important. If you implemented an algorithm of demonstrated performance, #2 should be sufficient. Although, I would probably test more than a few seeds and the sequences that resulted using some knowledge of the structure of your particular generator.

Tall Jeff
A: 

Strictly, there's no way to test if the random generator is really random :-) First approach gives you the knowledge of can it keed distribution proper or not for a fixed amount of samples only, no matter how big this ammount is. The second approach can suport knowlege of does it behave like an algorythm, but again for a fixed ammount of samples.

The best you can do - use both.

akalenuk
+2  A: 

Plesae be aware: If you have 'invented' your PRNG, you will probably have got it wrong and produced something that has a less than optimal distribution. The basic test for how random your generator is, is the Chi-Square test

Mitch Wheat
A: 

Roll a casino die 10 thousand times and then compare the outputs ;)

Robert Gould
A: 
Omar Kooheji
+3  A: 

The "randomness" in a pseudo-random number generator is generally expressed as the average number of iterations before a number repeats. There are numerous algorithms that have varying "randomness" and performance. Random.org has a good explanation of some analysis done on their algorithms. Check out the pictures about halfway down the page. It's easy to see in the static images just how random the two algorithms are.

One feature of a PRNG (a true feature, not a bug disguised as a feature) is that it is predictable. For a given seed, the same sequence of numbers should be produced. This is extremely important and necessary for testing and debugging programs that use stochastic (aka, random) methods.

The number sequence should approximate a certain statistical distribution. Test your PRNG by generating a large sequence (say 10^6 numbers) and perform several statistical tests on the sequence, particularly the Chi-Squared test (if the distribution is normal). Make a histogram of your sample and see if it looks like you expect.

If you control how the seed is set, the sequence generated should be the same every time, which is suitable for doing white-box testing. When doing your tests, it's also a good idea to "warm up" the generator by running it for 100 iterations or so before collecting your sample.

Scottie T
+2  A: 

You may find some of the responses to this question useful.

Basically, you can't "prove" that the RNG is random, but you can perform various tests to improve your confidence that it is. These tests vary in complexity. Diehard is comprehensive, but it doesn't really offer a yes/no answer, more like a couple of hundred "maybes". At the other end of the spectrum, it's pretty simple to generate a sequence of values (10,000 at least) and then check whether the mean and standard deviaton/variance are as expected.

Dan Dyer
+2  A: 

Here's a CodeProject article that includes an implementation of the Kolmogorov-Smirnov test mentioned in Donald Knuth's volume 2, Seminumerical Algorithms. As InSciTek Jeff mentioned above, there are two issues: testing the algorithm and testing your implementation of the algorithm. The K-S test is likely to find bugs in the implementation, and it's a good start toward testing the quality of the algorithm itself.

John D. Cook
A: 

Way back in school, I was developing a random number generator for a simulation assignment, and needed some way to identify the non-randomness.

I got the bright idea of taking two random numbers and graphing them (x,y). It's amazing how the human brain can detect non-random patterns. ("Random Pattern" is an oxymoron.)

I tweaked the PRNG to remove the striations and starbursts that appeared on the graph, then plotted (log(x), log(y)) for a whole new perspective and repeated the process.

Later, I was forced to take statistics and learned that you can do weird math to quantify randomness.

dar7yl