views:

268

answers:

3

is there better way than just go left or right based on the input digit 0 or 1?

+1  A: 

There are some papers on efficient decoding algorithms for Huffman Trees. Personally, I only used one of them, for academic reasons, but it was a long time ago. The title of the paper was "A Memory Efficient and Fast Huffman Decoding Algorithm" by Hong-Chung Chen, Yue-Li Wang and Yu-Feng Lan.

The algorithm gives results in O(log n) time. In order to use this algorithm, you have to construct a table with all the symbols of your tree (leaves), and for each symbol you have to specify a weight:

w(i) = 2 ^ (h - l)

where h is the height of Huffman Tree and l is the level of the symbol, and a count:

count(i) = count(i-1) + w(i)

The count of root, count(0), is equal to its weight.

When you have all that, there are 3 simple steps in the algorithm, that are described in the paper.

I don't know if this is what you were looking for.

Alex
A: 

Sure, instead of 2-trees you can use k-trees and get O(ln_k(n)) speedup. It's not much.

If the max key size is small (say < 8 bits) or you've got lots of memory, you can use a straight lookup table & get O(1).

Ray
+1  A: 

Yes, there is, and you can use lookup tables.

Note that you will use quite a bit of memory to store these tables, and you will either have to ship that table with the data (probably negating the effect of the compression altogether) or construct the table before decompressing (which would negate some, if not all, of the speedup you gain from it.)

Here's how the table would work.

Let's say, for a sample portion of compressed data that the bit-sequences for symbols are as follows:

1 -> A
01 -> B
00 -> C

What you do now is to produce a table, indexed by a byte (since you need to read 1 byte minimum to decode the first symbol), containing entries like this:

key      symbol       bit-shift
1xxxxxxx    A             7
01xxxxxx    B             6
00xxxxxx    C             6

The x's means you need to store entries with all possible combinations for those x's, with that data. For the first row, this means you would construct a table where every byte-key with the high-bit set would map to A/7.

The table would have entries for all 256 key values, half of them mapped to A/7, and 25% to B/6 and C/6, according to the rules above.

Of course, if the longest bitsequence for a symbol is 9-16 bites, you need a table keyed by a 16-bit integer, and so on.

Now, when you decode, here's how you would do this:

read first and second byte as key, append them together

loop:
   look up the first 8 bits of the current key in the table
   emit symbol found in table
   shift the key bitwise to the left the number of bits specified in the table
   if less than 8 bits left in the key, get next byte and append to key

when at end, just pad with 0-bytes, and as with all Huffman decompression you need to know how many symbols to emit before you start.

Lasse V. Karlsen