views:

82

answers:

3

I'm a student of computer science and we have to use BaseX (a pure Java OSS XML database) in one of our courses. While browsing through the code I discovered the following piece of code:

  /**
    * Returns a md5 hash.
    * @param pw password string
    * @return hash
    */
   public static String md5(final String pw) {
     try {
       final MessageDigest md = MessageDigest.getInstance("MD5");
       md.update(Token.token(pw));
       final TokenBuilder tb = new TokenBuilder();
       for(final byte b : md.digest()) {
         final int h = b >> 4 & 0x0F;
         tb.add((byte) (h + (h > 9 ? 0x57 : 0x30)));
         final int l = b & 0x0F;
         tb.add((byte) (l + (l > 9 ? 0x57 : 0x30)));
       }
       return tb.toString();
     } catch(final Exception ex) {
       Main.notexpected(ex);
       return pw;
     }
   }

(source: https://svn.uni-konstanz.de/dbis/basex/trunk/basex/src/main/java/org/basex/util/Token.java)

Just out of interest: what is happening there? Why these byte operations after the MD5? The docstring is saying it returns a MD5 hash...does it?

+3  A: 

I didn't look up the definitions for the classes used, but the byte operations seem to be encoding the returned byte array into a string of hex characters.

for(final byte b : md.digest()) {
  // get high 4 bytes of current byte
  final int h = b >> 4 & 0x0F;
  // convert into hex digit (0x30 is '0' while 0x57+10 is 'a')
  tb.add((byte) (h + (h > 9 ? 0x57 : 0x30))); 
  // the same for the bottom 4 bits
  final int l = b & 0x0F;
  tb.add((byte) (l + (l > 9 ? 0x57 : 0x30)));
}

This is a great example of why using magic numbers is bad. I, for one, honestly couldn't remember that 0x57+10 is the ASCII/Unicode codepoint for 'a' without checking it in a Python interpreter.

Matti Virkkunen
Thx for the clarification
Martin
A: 

I guess Matti is right - as the md.digest() returns an byte[] and BaseX uses Tokens in favor of Strings (thus the TokenBuilder). So the conversion from md.digest() to String is done via a conversion of Digest-Hex to Token.

Not exactly easy to read but quite similar to what Apache Commons does in their Codec Library to get the String value of a md5 hash.

michael
A: 

This is a great example of why using magic numbers is bad.

Well, this is a core method, which isn't supposed to be modified by others – and this looks like the most efficient way to do it. But, true, the documentation could be better. Talking about core methods, it's worthwhile looking at code like Integer.getChars():

http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/lang/java/lang/Integer.java.htm

Christian
@Christian: No matter how "core" your code is, it will eventually be read by someone (*like just now*) and therefore it should be readable. Efficiency wouldn't be sacrificed by writing the expressions in a more readable format, or at least leaving a comment for the poor reader.
Matti Virkkunen