views:

35

answers:

3

Given a infinite stream of random 0's and 1's that is from a biased (e.g. 1's are more common than 0's by a know factor) but otherwise ideal random number generator, I want to convert it into a (shorter) infinite stream that is just as ideal but also unbiased.

Looking up the definition of entropy finds this graph showing how many bits of output I should, in theory, be able to get from each bit of input.

Entropy of a coin flip in bits versus the fairness of the coin

The question: Is there any practical way to actually implement a converter that is nearly ideally efficient?

+4  A: 

There is a well-known device due to Von Neumann for turning an unfair coin into a fair coin. We can use this device to solve our problem here.

Repeatedly draw two bits from your biased source until you obtain a pair for which the bits are different. Now return the first bit, discarding the second. This produces an unbiased source. The reason this works is because regardless of the source, the probability of a 01 is the same as a probability of a 10. Therefore the probability of a 0 conditional on 01 or 10 is 1/2 and the probability of a 1 conditional on 01 or 10 is 1/2.

Jason
Also, the pairs must not overlap.
Joe Koberg
What is the efficiency of that? How many bits need be used to generate one output bit? (Aside from that, nice and simple, +1)
BCS
That depends on the distribution of your data. If you get a string of 10000 "1" bits interspersed with one "0" bit - the algorithm will produce a single output bit.
Joe Koberg
OK. Tack "as a function of the distribution" on the start of the question.
BCS
It's not magic.... If your random bit source has long runs, they will be absorbed. If it has frequency components, they might show up in the output. For example an alternating input of 0101010 will yield a constant 1 or 0 output, and an input of 00110011001100110011 will yield NO output!)
Joe Koberg
Wouldn't the `00110011...` give `101010101...`? I'm assuming that on the flips-equal case that you shift in the next flip rather than discard both. --- Also, little empirical study using excel shows that this filter's bits in vs. bits out inefficiency (compared to the theoretic ideal) ranges runs from the high 3's to low 5's, with only slightly higher cost (but much higher variance) for high biases.
BCS
Yes I guess it would depend highly on details like that. But the point is that the thing doesn't whiten everything perfectly - it just makes the split a lot closer to 50%.
Joe Koberg
The wikipedia article says "successive pairs of consecutive bits (non-overlapping) from the input stream" ...
Joe Koberg
+2  A: 

Please see

Joe Koberg
The Randomness extractor seems to be the closest to what I'm looking for. (The others seems to apply to finite data sets and I was really looking for a stream operation)
BCS
A: 

Hoffman encode the input.

Given that the input is of a known bias, you can compute a probability distribution for check sums of each n bit segment. From that construct a Hoffman code and then just encode the sequence.

I'm not sure but one potential problem is that this might introduce some correlation between sequential bits.

BCS