views:

89

answers:

2

This is based on this question. A number of answers were proposed that generate non-uniform distributions and I started wondering how to quantify the non uniformity of the output. I'm not looking for patterning issues, just single value aspects.

What are the accepted procedures?


My current thinking is to computer the average Shannon entropy per call by computing the entropy of each value and taking a weighted average. This can then be compered to the expected value.

My concerns are

  1. Is this correct?
  2. How to compute these value without loosing precision?

For #1 I'm wondering if I've got it correct.

For #2 the concern is that I would be processing numbers with magnitudes like 1/7 +/- 1e-18 and I'm worried that the floating point errors will kill me for any but the smallest problems. The exact form of the computation could result in some major differences here and I seem to recall that there are some ASM options for some special log cases but I can't seem to find the docs about this.


In this case the use is take a "good" PRNG for the range [1,n] and generate a SRNG for the range [1,m]. The question is how much worse is the results than the input?

What I have is expected occurrence rates for each output value.

+4  A: 

NIST has a set of documents and tools for statistically analyzing random number generators cross a variety of metrics.

http://csrc.nist.gov/groups/ST/toolkit/rng/index.html

Many of these tests are also incorporated into the Dieharder PRNG test suite.

http://www.phy.duke.edu/~rgb/General/rand_rate.php

There are a ton of different metrics, because there are many, many different ways to use PRNGs. You can't analyze a PRNG in a vacuum - you have to understand the use case. These tools and documents provide a lot of information to help you in this, but at the end of the day you'll still have to understand what you actually need before you can determine of the algorithm is suitable. The NIST documentation is thorough, if somewhat dense.

Adam Davis
+1  A: 

This page discusses one way of checking if you are getting a bad distribution: plotting the pseudo-random values in a field and then just looking at them.

twk
not quantifiable and if I'm getting 0.25000000001 percent in one bucket the eye will never see it.
BCS