views:

98

answers:

2
    template<class T>
    void huffman(MinHeap<TreeNode<T>*> heap, int n)
    {
      for(int i=0;i<n-1;i++)
      {
        TreeNode<T> *first = heap.pop();
        TreeNode<T> *second = heap.pop();
        TreeNode<T> *bt = new BinaryTreeNode<T>(first, second, first.data, second.data);
        heap.push(bt);
      }
    }

In my Fundamentals of Data Structures in C++ textbook, it gave a 2 page definition of Huffman coding, and the code above. To me, the book wasn't enough detailed, so I've done the googling and I learned how the process of Huffman coding works. The textbook claims that at the end of the code above, a Huffman tree is made. But to me it seems wrong, because a Huffman tree, is not necessary a complete tree, but the code above seems to always give a complete tree because of the heap.push(). So can someone explain to me how this piece of code is not wrong?

+3  A: 

The heap's tree structure does not necessarily match the resulting Huffman tree -- rather, the heap contains a forest of partial Huffman trees, initially each consisting of a single symbol node. The loop then repeatedly takes the two nodes with the least weight, combines them into one node, and puts the resulting combined node back. At the end of the process, the heap contains one finished tree.

Jeffrey Hantin
Then what I don't get is that if the heap is not necessarily the huffman tree, how am I able traverse or use the heap to find the correct leaf node.
The heap is a temporary auxiliary data structure used to improve efficiency of the operation of finding the two Huffman nodes with the least weight. In the end, the heap contains only one element, a single `BinaryTreeNode<T>`, which can be popped out and is then the root of the finished Huffman tree; the heap can then be destroyed because it is no longer needed.
Jeffrey Hantin
thank you so much! you were really helpful.
A: 

Huffman encoding works by taking the two lowest value items at each step. When you first call the function (since your MinHeap is sorted by value) the two lowest value items are popped off and "combined" into a decision node which is then put back into the heap. That node is scored by the sum of its child scores and put back into the heap. Inserting it back into the heap puts it into the right place based on its score; if it's still lower than any other items it'll be first, otherwise it'll be somewhere else.

So this algorithm is building the tree from the bottom up, and by the time you empty the heap you'll have a complete tree. I don't understand what the 'n' is for, though; the loop should be while (heap.size() > 1). Regardless, the tree is not "full", different branches will be different lengths depending on how the frequencies of the items in the initial heap are scored.

dash-tom-bang