In general, about 50% of the incoming symbol stream would have to consist of a given symbol for Huffman to code it as a single bit. The reason for this is that due to how Huffman coding works (the one symbol's encoding cannot be the prefix of another), by encoding a symbol with a single bit, you are requiring that the first bit for every other symbol be the opposite value (i.e. if one symbol is encoded as 0
, everything else must start with 1
plus at least one more bit). Since you're eliminating half of the possible encoding space for any given bit length, you need to be gaining a way to encode at least half of the symbols being input in order to break even.
Note that there is a special case where the symbol space only consists of 3 symbols. In such a case, whichever symbol has the greatest frequency will be encoded with 1 bit (since the other two will be the 2nd-bit variations of whichever first-bit value isn't chosen) - if 2 or more have equally greater probability, either one could be encoded. Thus, in the 3-symbol case it's possible that a symbol with, say, 34% probability could theoretically be coded as a single bit (say, 0
) while the other two might have 33% or lower probabilities and be coded as 10
and 11
.
So if you're considering all possibilities, then technically anything 1/3 or above could potentially be encoded as a single bit (in the 3-symbols case).