views:

1482

answers:

2

I thought I could work this one out myself but I don't seem to be moving forward at all.

Ok, the background:

I need to create a Huffman tree of codes from the information provided by the FFC4, DHT (Define Huffman Table) header in a jpg file. The DHT header defines the Huffman table in this way:

1) A series of 16 bytes. Each byte defines how many symbols have a Huffman code of n amount of bits where n is the position of the byte in the series. (did that make any sense?!!) For example the raw data in hex is:

00 01 05 01 01 01 ... 00

this means that:

Num of bits:    1   2   3   4   5   6   7 ...  16  
Num of codes:   00  01  05  01  01  01  01 ... 00

2) After the list of 16 bytes come the actual symbols themselves. For example:

00 01 02 03 04 05 06 07 08 09 0A 0B

3) Combining the two parts we see that their are:
00 codes with 1 bit.
01 codes with 2 bits: so take the first symbol from the list: 00
05 codes with 3 bits: so take the next 5 symbols from the list: 01 02 03 04 05
.. and so on

4) Finally we have to work out the actual Huffman codes from the information above. If you're a mathematical genius, you may have spotted already that these codes can be worked out by simply incrementing a binary number with the appropriate number of bits for every new code at a certain bit length. When the bit length increases, simply increment the binary number and then double it and carry on. It becomes obvious to everyone else when you've drawn out a number of these binary trees by hand...

5) So this is the code I used to work out the Huffman codes and store them in an array: (first I read the data at 1) and put it in an array: dhtBitnum)

            int binaryCode = 0;
            count = 0;
            StringBuffer codeString = new StringBuffer();               

            //populate array with code strings
            for (int numBits=1; numBits<=16; numBits++) {

                //dhtBitnum contains the number of codes that have a certain number of bits
                for (int i=1; i<=dhtBitnum[(numBits-1)]; i++) {

                    //turn the binary number into a string
                    codeString.append(Integer.toBinaryString(binaryCode)); 
                    //prepend 0s until the code string is the right length
                    for(int n=codeString.length(); n<numBits; n++) {
                        codeString.insert(0, "0");
                    }
                    //put the created Huffman code in an array
                    dhtCodes[count]=codeString.toString();
                    binaryCode++;
                    count ++;
                    codeString.delete(0, codeString.length());
                }
                binaryCode = binaryCode << 1;

            }

Once I have generated the Huffman codes and stored them in order, I can just add the symbols that they refer to in order as they come along in 2). This may not be terribly elegant but it seems to work at least and creates the correct tables.

6) If anyone is still following this, you deserve a medal.

7) Now the problem is I'd like to store this information in a binary tree so that I can efficiently decode the jpg image data later on, rather than searching through arrays everytime. Unfortunately I can't figure out a nice clean and efficient way to do this directly from the information provided in the jpg headers as above.
The only way I can think of is by working out the Huffman codes as above first, then implementing some method that creates nodes as needed and saves the symbols in the appropriate place, using the codes as an address of sorts. However this seems such a round about way that is also duplicating the information I need, I'm sure there must be a much better and simpler method.

8) So if anyone understood my ramblings, I'd be very grateful for some suggestions. I realise this is a very specific problem, but if nothing else the info above might prove helpful to someone. I am still very new at this so excuse my ignorance, easy to understand code is especially welcome!

+1  A: 

As to how to implement this directly I'm not entirely sure, as it takes a while to process the information, but the algorithm should be pretty straight forward if you know about tries. It seems from point 7 that you do not?

I'll add an example:

         °
        / \
       /   \
      °     °
     / \   / \
    a   b  c  d

In this simple trie, if we go left we'll assume left is a 0, right is a 1. So If you encounter the pattern '00' this corresponds with an a. The pattern '10' corresponds with a c. Usually with huffman trees the trie will be unbalanced, i.e.

     °
    / \
   a   °
      / \
     b   °
        / \
       c   d

In this case, the prefix code '0' corresponds to an a. The code 111 to a 'd'.

wds
Thanks wds, tries were mentioned to me in another post on a similar question. I'm not sure I understand the difference between them and a binary tree though. Thing is I'm having problems efficiently creating the trie/tree directly from the DHT header information.
joinJpegs
Don't worry I realise this is probably too esoteric a question for any specific answers. I was kind of hoping I might fall upon a jpg guru with their own decoder who could throw some ready made code my way!
joinJpegs
Hmmm could you edit your question to explain how you know that the symbol with, say, two bits is encoded as 00 and not 11 for instance? Then I can probably come up with an algorithm.
wds
A: 

As @wds said, tries help. The core idea with Huffman decoding is that the bits of the code seqence should be used to "walk" the trie, taking a left when the code has a 0, and a right for a 1, until the code word ends. Then you will be in the trie node storing the replacement data for that code.

unwind
I thought I was being a bit terse so I just added an example to elicit that very point. :)
wds