Hi,
I need huffman code(best in python or in java), which could encode text not by one character (a = 10, b = 11)
, but by two (ab = 11, ag = 10)
. Is it possible and if yes, where could i find it, maybe it's somewhere in the internet and i just can'd find it?
views:
164answers:
3Huffman code doesn't care about characters, it cares about symbols. Generally, it is used to encode the alphabet / other single characters, but can very easily be generalized to encode strings of characters. Basically, you would just take an existing implementation and allow symbols to be strings rather than characters. A leaf node would then correspond to a list of strings.
There is probably some code somewhere. But this sounds like a parsing and tokenising question. One of the first questions I would be answering is how many unique pairs are you dealing with. Huffman encoding works best with small numbers of tokens. For example, the 101 characters on your keyboard. But if your two characters can be anything, you are now expanding the maximum number of characters massively.
There's a Huffman encoder example distributed with the Python bitarray module, if that's any use to you.