views:

37

answers:

1

I have a question which I think involves "conditional entropy" in the field of information theory. I am trying to wrap my head around it, but could use some help. Consider an example in which we have four houses. In the first house there are eight people, four people live in the second house, and there are two people in third house, and two people in the fourth house. So, four houses and sixteen people. If I simply choose one of these people at random, then that choice is a selection from among sixteen people, yielding an information entropy of 4 bits for that choice.

But now consider a two-step selection in which first I choose one house at random, and then I choose one of the people in the selected house. So the first step, that of picking one house from the four houses available, generates two bits of information entropy. But now, in the 25% of the time that I pick the first house, the second step adds another three bits in the choosing of one person from among the eight people in the first house. In another 25% of the cases, I need only another two bits to select one person from the four that live in the second house. And finally, in fully half of the cases, I need only a single bit to pick one person from the pair that lives in either the third or the fourth house.

Somehow, it seems to me, that the weighted average of bit-counts for the two-step approach should generate the same four bit total that the single-step method requires. But I can not get the figures to add up, so clearly there is more to the math than I am considering. I was expecting that you should simply be able to add up the probabilities like so:

(picking a house) + (picking a person in that house) ==

log(4) + [(1/4)*log(8) + (1/4)*log(4) + (1/4)*log(2) + (1/4)*log(2)]

But this produces a result of 3.75 bits, and not the 4 bits that I am expecting. Here is a bit of python that I used to evaluate this.

from math import log
def log2(x):
    return log(x,2)
x = log2(4) + ((1.0/4)*log2(8) + (1.0/4)*log2(4) + (1.0/4)*log2(2) + (1.0/4)*log2(2))
print x

So, something is missing from my figures. Can anyone point me in the right direction?

Many thanks, -- jeffh

+1  A: 

If you choose a house at random (with uniform probability, UP for short), then choose a resident at random (UP), you're not choosing one out of 16 UP -- you have a somewhat skewed distribution, which unsurprisingly yields lower entropy (UP maximizes entropy). Eight people are being selected with a probability of 1/32 each, four are being selected with a probability of 1/16 each, and the other four with a probability of 1/8 each. This distribution has an entropy of 3.75 bits, just as you computed with your different approach.

Alex Martelli
Yes, this is clearly the answer. The two-step choosing process chooses the people from a different probability distribution than the one-step (uniform probability) choosing process.
Justin Peel
Yes, thanks Alex. I understand where I went wrong. Somehow I became convinced that the bit-counts had to reach the same total regardless of the decision process. But that completely ignores the entire point of the entropy calculation! I see now how putting the people into houses adds information, and therefore reduces the entropy.
Jeff Hultquist
@Jeff, I prefer to think of it in terms of probabilities rather than information, but I guess this just reflects my background -- in the end, the computations come to the same total;-).
Alex Martelli