Hi, my understanding of the entropy formula is that it's used to compute the minimum number of bits required to represent some data. It's usually worded differently when defined, but the previous understanding is what I relied on until now.
Here's my problem. Suppose I have a sequence of 100 '1' followed by 100 '0' = 200 bits. The alphabet is {0,1}, base of entropy is 2. Probability of symbol "0" is 0.5 and "1" is 0.5. So the entropy is 1 or 1 bit to represent 1 bit.
However you can run-length encode it with something like 100 / 1 / 100 / 0 where it's number of bits to output followed by the bit. It seems like I have a representation smaller than the data. Especially if you increase the 100 to much larger number.
I'm using: http://en.wikipedia.org/wiki/Information_entropy as reference at the moment. Where did I go wrong? Is it the probability assigned to symbols? I don't think it's wrong. Or did I get the connection between compression and entropy wrong? Anything else?
Thanks.
Edit
Following some of the answers my followup are: would you apply the entropy formula to a particular instance of a message to try to find out its information content? Would it be valid to take the message "aaab" and say the entropy is ~0.811. If yes then what's the entropy of 1...10....0 where 1s and 0s are repeated n times using the entropy formula. Is the answer 1?
Yes I understand that you are creating a random variable of your input symbols and guessing at the probability mass function based on your message. What I'm trying to confirm is the entropy formula does not take into account the position of the symbols in the message.