views:

577

answers:

3

I recently started looking at MD5 hashing (in Java) and while I've found algorithms and methods to help me accomplish that, I'm left wondering how it actually works.

For one, I found the following from this URL:

private static String convertToHex(byte[] data) {
    StringBuffer buf = new StringBuffer();
    for (int i = 0; i < data.length; i++) {
        int halfbyte = (data[i] >>> 4) & 0x0F;
        int two_halfs = 0;
        do {
            if ((0 <= halfbyte) && (halfbyte <= 9))
                buf.append((char) ('0' + halfbyte));
            else
                buf.append((char) ('a' + (halfbyte - 10)));
                halfbyte = data[i] & 0x0F;
            } while(two_halfs++ < 1);
        }
    return buf.toString();
}

I haven't found any need to use bit-shifting in Java so I'm a bit rusty on that. Someone kind enough to illustrate (in simple terms) how exactly does the above code does the conversion? ">>>"?

I also found other solutions on StackOverflow, such as here and here, which uses BigInteger instead:

try {
   String s = "TEST STRING";
   MessageDigest md5 = MessageDigest.getInstance("MD5");
   md5.update(s.getBytes(),0,s.length());
   String signature = new BigInteger(1,md5.digest()).toString(16);
   System.out.println("Signature: "+signature);

} catch (final NoSuchAlgorithmException e) {
   e.printStackTrace();
}

Why does that work too, and which way is more efficient?

Thanks for your time.

+1  A: 

For a thorough explanation on bitshifting check out the answers in the following SO question http://stackoverflow.com/questions/141525/absolute-beginners-guide-to-bit-shifting

He seems to try to convert one single byte into a number smaller than 16, by doing so he can easily determine wich caracther that byte represents with the code

  if ((0 <= halfbyte) && (halfbyte <= 9))
                buf.append((char) ('0' + halfbyte));
            else
                buf.append((char) ('a' + (halfbyte - 10)));

This is a simplistic answer, but im not that bright anyhow =D

Nuno Furtado
+7  A: 
private static String convertToHex(byte[] data) {
    StringBuffer buf = new StringBuffer();
    for (int i = 0; i < data.length; i++) {

Up till this point ... just basic set up and starting a loop to go through all bytes in the array

        int halfbyte = (data[i] >>> 4) & 0x0F;

bytes when converted to hex are two hex digits or 8 binary digits depending on what base you look at it in. The above statement shifts the high 4 bits down (>>> is unsigned right shift) and logical ANDs it with 0000 1111 so that the result is an integer equal to the high 4 bits of the byte (first hex digit).

Say 23 was an input, this is 0001 0111 in binary. The shift makes and logical AND coverts this to 0000 0001.

        int two_halfs = 0;
        do {

This just sets up the do/while loop to run twice

            if ((0 <= halfbyte) && (halfbyte <= 9))
                buf.append((char) ('0' + halfbyte));
            else
                buf.append((char) ('a' + (halfbyte - 10)));

Here we're displaying the actual hex digit, basically just using the zero or a character as a starting point and shifting up to the correct character. The first if statement covers all the digits 0-9, and the second covers all digits 10-15 (a-f in hex)

Again, using our example 0000 0001 in decimal is equal to 1. We get caught in the upper if block and add 1 to the '0' character to get the character '1', append that to the string and move on.

                halfbyte = data[i] & 0x0F;

Now we set up the integer to just equal the low bits from the byte and repeat.

Again, if our input was 23 ... 0001 0111 after the logical AND becomes just 0000 0111 which is 7 in decimal. Repeat the same logic as above and the character '7' is displayed.

            } while(two_halfs++ < 1);

Now we just move on to the next byte in the array and repeat.

        }
    return buf.toString();
}

To answer your next question, the Java API already has a base conversion utility built in to BigInteger already. See the toString(int radix) documentation.

Not knowing the implementation used by the Java API, I can't say for sure, but I'd be willing to bet that the Java implenentation is more efficient than the first somewhat simple algorithm you posted.

tschaible
+1 for the effort and for beating me to it. The only thing I would add is a reference to the bitwise operation documentation: http://www.j2ee.me/docs/books/tutorial/java/nutsandbolts/op3.html
Welbog
thanks for this explanation
Nuno Furtado
Thank you very much for the detailed explanation!
aberrant80
+2  A: 

To answer this bit:

Why does that work too

It doesn't. At least, not the same way that the loop version does. new BigInteger(...).toString(16) will not show leading zeroes, which the former version will. Usually for something like writing out a byte array (especially one representing something like a hash) you would want a fixed-length output so if you want to use that version you'd have to pad it out appropriately.

Cowan
Thank you for noting the difference.
aberrant80