views:

259

answers:

2

Hello, I have this homework: finding the code words for the symbols in any given alphabet. It says I have to use binary Huffman on groups of three symbols. What does that mean exactly? Do i use regular Huffman on [alphabet]^3? If so, how do I then tell the difference between the 3 symbols in a group?

+1  A: 

I can't quite tell, because your description of the problem isn't all that detailed, but I would guess that they mean that instead of encoding each symbol in your alphabet individually, you are supposed to tread each triple of symbols as a group.

So, for instance, if your alphabet consists of a, b, and c, instead of generating an encoding for each of those individually, you would create an encoding for aaa, aab, aac, etc. Each one of these strings would be treated as a separate symbol in the Huffman algorithm; you can tell them apart simply by doing string comparison on them. If you need to accept input of arbitrary length, you will also need to include in your alphabet symbols that are strings of length 1 or 2. For instance, if you're encoding the string aabacab, you would need to break that down into the symbols aab, aca, and b.

Does that help answer your question? I wasn't quite sure what you're looking for, so please feel free to edit your question or reply in a comment if this hasn't cleared anything up.

Brian Campbell
that's what I was suspecting myself, because it seems practical and raises efficency (is that how it's spelled?), but it sais I should get the code word for every symbol. Normaly I think she should be talking about the symbols that are input, since she never mentioned any others...ever.The teacher never mentioned this method in class (and hasn't answered my emails about it for a week), but I will probably just do this and hope it'll be worth al least half the points. At least now I know I'm not the only one that gets that from the question.thanks a lot, and wish me luck!
Good luck with that! I know it can be frustrating when given an unclear assignment; though if you want help interpreting the assignment, it might be easier if you post the exact wording instead of your own paraphrase. (Oh, and it's spelled "efficiency", since you asked; you were missing an "i").
Brian Campbell
It was actualy a pretty accurate translation from romanian. I am not kidding, my assignment had 2 lines of text (the second line asked me to calculate average word length). Needless to say, I am very disappointed with the school system around here.
Oh, OK. I don't think I could help you with interpreting the question if it's in Romanian. That does sound like a pretty vague assignment; I'm sorry that the school system around there isn't very good, though one of the great things about the internet is that you do have so many other resources to learn from.
Brian Campbell
A: 

Food for thought: what about shorter strings, and permutations of "block boundaries"? What about 1 and 2 character strings? Do you just count off 3, 6, 9, 12, ... chars into your input text and then null pad any uneven lengths at the end?

If the chunks can be of variable size, then it gets really interesting to find the best fit. I suspect it degenerates into a traveling salesman kind of problem, but maybe there's a neat "theorem" or other tool out there for this kind of thing.

Perhaps try all permutations of 3 chars, saving the most frequently used, then try to come up with a good fit for the 1 and 2 char long gaps? Hmm, sounds like it might be really slow, but doable using some kind of recursive divide and counquer approach: pull out the long string of block length N, then recurse into encoding the gaps as length N - 1.

More questions than answers, I'm afraid.

Roboprog
Argh, I repeated much of Brian's answer, just stated a different way while "thinking out loud". Good night...
Roboprog
as an amateur programmer I'd love to fool around with theese ideas, but as an electronics engineer I don't really have much memory to work with, so I should concentrate on making it adaptive and instantaneous.but thanks for the answer