views:

1007

answers:

5

Hi, my understanding of the entropy formula is that it's used to compute the minimum number of bits required to represent some data. It's usually worded differently when defined, but the previous understanding is what I relied on until now.

Here's my problem. Suppose I have a sequence of 100 '1' followed by 100 '0' = 200 bits. The alphabet is {0,1}, base of entropy is 2. Probability of symbol "0" is 0.5 and "1" is 0.5. So the entropy is 1 or 1 bit to represent 1 bit.

However you can run-length encode it with something like 100 / 1 / 100 / 0 where it's number of bits to output followed by the bit. It seems like I have a representation smaller than the data. Especially if you increase the 100 to much larger number.

I'm using: http://en.wikipedia.org/wiki/Information_entropy as reference at the moment. Where did I go wrong? Is it the probability assigned to symbols? I don't think it's wrong. Or did I get the connection between compression and entropy wrong? Anything else?

Thanks.

Edit

Following some of the answers my followup are: would you apply the entropy formula to a particular instance of a message to try to find out its information content? Would it be valid to take the message "aaab" and say the entropy is ~0.811. If yes then what's the entropy of 1...10....0 where 1s and 0s are repeated n times using the entropy formula. Is the answer 1?

Yes I understand that you are creating a random variable of your input symbols and guessing at the probability mass function based on your message. What I'm trying to confirm is the entropy formula does not take into account the position of the symbols in the message.

+4  A: 

Have a look at Kolmogorov complexity

The minimum number of bits into which a string can be compressed without losing information. This is defined with respect to a fixed, but universal decompression scheme, given by a universal Turing machine.

And in your particular case, don't restrict yourself to alphabet {0,1}. For your example use {0...0, 1...1} (hundred of 0's and hundred of 1's)

Anonymous
A: 

I don't see any random variable here, your string is fixed !

shodanex
+4  A: 

Or did I get the connection between compression and entropy wrong?

You're pretty close, but this last question is where the mistake was. If you're able to compress something into a form that was smaller than its original representation, it means that the original representation had at least some redundancy. Each bit in the message really wasn't conveying 1 bit of information.

Because redundant data does not contribute to the information content of a message, it also does not increase its entropy. Imagine, for example, a "random bit generator" that only returns the value "0". This conveys no information at all! (Actually, it conveys an undefined amount of information, because any binary message consisting of only one kind of symbol requires a division by zero in the entropy formula.)

By contrast, had you simulated a large number of random coin flips, it would be very hard to reduce the size of this message by much. Each bit would be contributing close to 1 bit of entropy.

When you compress data, you extract that redundancy. In exchange, you pay a one-time entropy price by having to devise a scheme that knows how to compress and decompress this data; that itself takes some information.

However you can run-length encode it with something like 100 / 1 / 100 / 0 where it's number of bits to output followed by the bit. It seems like I have a representation smaller than the data. Especially if you increase the 100 to much larger number.

To summarize, the fact that you could devise a scheme to make the encoding of the data smaller than the original data tells you something important. Namely, it says that your original data contained very little information.


Further reading

For a more thorough treatment of this, including exactly how you'd calculate the entropy for any arbitrary sequence of digits with a few examples, check out this short whitepaper.

John Feminella
Thanks. In terms of the formula itself, will the formula tell me this? The fact that the message contains little information? Please see my edit.
Budric
+3  A: 

Your encoding works in this example, but it is possible to conceive an equally valid case: 010101010101... which would be encoded as 1 / 0 / 1 / 1 / ...

Entropy is measured across all possible messages that can be constructed in the given alphabet, and not just pathological examples!

butterchicken
+2  A: 

John Feminella got it right, but I think there is more to say.

Shannon entropy is based on probability, and probability is always in the eye of the beholder.

You said that 1 and 0 were equally likely (0.5). If that is so, then the string of 100 1s followed by 100 0s has a probability of 0.5^200, of which -log(base 2) is 200 bits, as you expect. However, the entropy of that string (in Shannon terms) is its information content times its probability, or 200 * 0.5^200, still a really small number.

This is important because if you do run-length coding to compress the string, in the case of this string it will get a small length, but averaged over all 2^200 strings, it will not do well. With luck, it will average out to about 200, but not less.

On the other hand, if you look at your original string and say it is so striking that whoever generated it is likely to generate more like it, then you are really saying its probability is larger than 0.5^200, so you are making a different assumptions about the original probability structure of the generator of the string, namely that it has lower entropy than 200 bits.

Personally, I find this subject really interesting, especially when you look into Kolmogorov (Algorithmic) information. In that case, you define the information content of a string as the length of the smallest program that could generate it. This leads to all sorts of insights into software engineering and language design.

I hope that helps, and thanks for your question.

Mike Dunlavey
Thanks for your comment. Treating the string as an outcome of the random experiment (rather than the symbols making up the string) is an interesting perspective.
Budric
Right. That's the difference between information and data. It is a very useful perspective.
Mike Dunlavey