views:

480

answers:

2

As a follow up question related to my question regarding efficient way of storing huffman tree's I was wondering what would be the fastest and most efficient way of searching a binary tree (based on the Huffman coding output) and storing the path taken to a particular node.

This is what I currently have:

  • Add root node to queue
  • while queue is not empty, pop item off queue
    • check if it is what we are looking
      • yes: Follow a head pointer back to the root node, while on each node we visit checking whether it is the left or right and making a note of it.
      • break out of the search
    • enqueue left, and right node

Since this is a Huffman tree, all of the entries that I am looking for will exist. The above is a breadth first search, which is considered the best for Huffman trees since items that are in the source more often are higher up in the tree to get better compression, however I can't figure out a good way to keep track of how we got to a particular node without backtracking using the head pointer I put in the node.

In this case, I am also getting all of the right/left paths in reverse order, for example, if we follow the head to the root, and we find out that from the root it is right, left, left, we get left, left, right. or 001 in binary, when what I am looking for is to get 100 in an efficient way.

Storing the path from root to the node as a separate value inside the node was also suggested, however this would break down if we ever had a tree that was larger than however many bits the variable we created for that purpose could hold, and at that point storing the data would also take up huge amounts of memory.

+2  A: 

Create a dictionary of value -> bit-string, that would give you the fastest lookup.

If the values are a known size, you can probably get by with just an array of bit-strings and look up the values by their index.

Lasse V. Karlsen
This still has the issue that the bit-string has to be stored, now we moved it from having the bit-string stored in a leaf node to a dictionary, with basically the same memory trade-off.
X-Istence
But you did ask for the fastest way, didn't you?
Lasse V. Karlsen
Sure, I absolutely did, but I asked specifically for a binary tree. Pre-computing the result was not allowed because I specifically excluded it with the last paragraph in my question.
X-Istence
A: 

If you're decoding Huffman-encoded data one bit at a time, your performance will be poor. As much as you'd like to avoid using lookup tables, that's the only way to go if you care about performance. The way Huffman codes are created, they are left-to-right unique and lend themselves perfectly to a fast table lookup.

BitBank