tags:

views:

306

answers:

12

I know this sounds like a broad question but I can narrow it down with an example. I am VERY new at Java. For one of my "learning" projects, I wanted to create an in-house MD5 file hasher for us to use. I started off very simple by attempting to hash a string and then moving on to a file later. I created a file called MD5Hasher.java and wrote the following:

import java.security.*;
import java.io.*;
public class MD5Hasher{
    public static void main(String[] args){
        String myString = "Hello, World!";
        byte[] myBA = myString.getBytes();
        MessageDigest myMD;
        try{
            myMD = MessageDigest.getInstance("MD5");
            myMD.update(myBA);
            byte[] newBA = myMD.digest();
            String output = newBA.toString();
            System.out.println("The Answer Is: " + output);
        } catch(NoSuchAlgorithmException nsae){
            // print error here
        }
    }
}

I visited java.sun.com to view the javadocs for java.security to find out how to use MessageDigest class. After reading I knew that I had to use a "getInstance" method to get a usable MessageDigest object I could use. The Javadoc went on to say "The data is processed through it using the update methods." So I looked at the update methods and determined that I needed to use the one where I fed it a byte array of my string, so I added that part. The Javadoc went on to say "Once all the data to be updated has been updated, one of the digest methods should be called to complete the hash computation." I, again, looked at the methods and saw that digest returned a byte array, so I added that part. Then I used the "toString" method on the new byte array to get a string I could print. However, when I compiled and ran the code all that printed out was this:

The Answer Is: [B@4cb162d5

I have done some looking around here on StackOverflow and found some information here:

http://stackoverflow.com/questions/415953/generate-md5-hash-in-java

that gave the following example:

String plaintext = 'your text here';
MessageDigest m = MessageDigest.getInstance("MD5");
m.reset();
m.update(plaintext.getBytes());
byte[] digest = m.digest();
BigInteger bigInt = new BigInteger(1,digest);
String hashtext = bigInt.toString(16);
// Now we need to zero pad it if you actually want the full 32 chars.
while(hashtext.length() < 32 ){
    hashtext = "0"+hashtext;
}

It seems the only part I MAY be missing is the "BigInteger" part, but I'm not sure.

So, after all of this, I guess what I am asking is, how do you know to use the "BigInteger" part? I wrongly assumed that the "toString" method on my newBA object would convert it to a readable output, but I was, apparently, wrong. How is a person supposed to know which way to go in Java? I have a background in C so this Java thing seems pretty weird. Any advice on how I can get better without having to "cheat" by Googling how to do something all the time?

Thank you all for taking the time to read. :-)

+2  A: 

It is OK to google for answers as long as you (eventually) understand what you copy-pasted into your app :-)

In general, I recommend starting with a good Java introductory book, or web tutorial. See these threads for more tips:

Péter Török
And the reason for the downvote is ...?
Péter Török
+2  A: 

MessageDigests compute a byte array of something, the string that you usually see (such as 1f3870be274f6c49b3e31a0c6728957f) is actually just a conversion of the byte array to a hexadecimal string.

When you call MessageDigest.toString(), it calls MessageDigest.digest().toString(), and in Java, the toString method for a byte[] (returned by MessageDigest.digest()) returns a sort of reference to the bytes, not the actual bytes.

In the code you posted, the byte array is changed to an integer (in this case a BigInteger because it would be extremely large), and then converted to hexadecimal to be printed to a String.

The byte array computed by the digest represents a number (a 128-bit number according to http://en.wikipedia.org/wiki/MD5), and that number can be converted to any other base, so the result of the MD5 could be represented as a base-10 number, a base-2 number (as in a byte array), or, most commonly, a base-16 number.

HalfBrian
A: 

Use an IDE that shows you where the "toString()" method is coming from. In most cases it's just from the Object class and won't be very useful. It's generally recommended to overwrite the toString-method to provide some clean output, but many classes don't do this.

xor_eq
Oh yes ... I use NetBeans. It has been IMMENSELY helpful in getting me as far as I am.
Brian
+1  A: 

Though I'm afraid that I have no experience whatsoever using Java to play with MD5 hashes, I can recommend Sun's Java Tutorials as a fantastic resource for learning Java. They go through most of the language, and helped me out a ton when I was learing Java.

Also look around for other posts asking the same thing and see what suggestions popped up there.

elmugrat
+1  A: 

The reason BigInteger is used is because the byte array is very long, too big too fit into an int or long. However, if you do want to see everything in the byte array, there's an alternate approach. You could just replace the line:

String output = newBA.toString();

with:

String output = Arrays.toString(newBA);

This will print out the contents of the array, not the reference address.

Justin Ardini
"String output = Arrays.toString(newBA)" -- not really helpful. It would show the array contents as a series of byte values e.g. "[-33, 1, 93, -104,..." instead of DF015D98...
Jason S
@Jason: You're right, in this specific application `Arrays.toString()` does not provide the best representation. I'll leave my answer because it *is* useful for most cases of printing out arrays.
Justin Ardini
+2  A: 

The key in this particular case is that you need to realize that bytes are not "human readable", but characters are. So you need to convert bytes to characters in a certain format. For arbitrary bytes like hashes, usually hexadecimal is been used as "human readable" format. Every byte is then to be converted to a 2-character hexadecimal string which you in turn concatenate together.

This is unrelated to the language you use. You just have to understand/realize how it works "under the hoods" in a language agnostic way. You have to understand what you have (a byte array) and what you want (a hexstring). The programming language is just a tool to achieve the desired result. You just google the "functional requirement" along with the programming language you'd like to use to achieve the requirement. E.g. "convert byte array to hex string in java".


That said, the code example you found is wrong. You should actually determine each byte inside a loop and test if it is less than 0x10 and then pad it with zero instead of only padding the zero depending on the length of the resulting string (which may not necessarily be caused by the first byte being less than 0x10!).

StringBuilder hex = new StringBuilder(bytes.length * 2);
for (byte b : bytes) {
    if ((b & 0xff) < 0x10) hex.append("0");
    hex.append(Integer.toHexString(b & 0xff));
}
String hexString = hex.toString();

Update as per the comments on the answer of @extraneon, using new BigInteger(byte[]) is also the wrong solution. This doesn't unsign the bytes. Bytes (as all primitive numbers) in Java are signed. They have a negative range. The byte in Java ranges from -128 to 127 while you want to have a range of 0 to 255 to get a proper hexstring. You basically just need to remove the sign to make them unsigned. The & 0xff in the above example does exactly that.

The hexstring as obtained from new BigInteger(bytes).toString(16) is NOT compatible with the result of all other hexstring producing MD5 generators the world is aware of. They will differ whenever you've a negative byte in the MD5 digest.

BalusC
A: 

I wrongly assumed that the "toString" method on my newBA object would convert it to a readable output, but I was, apparently, wrong. How is a person supposed to know which way to go in Java?

You could replace here Java with the language of your choice that you don't know/haven't mastered yet. Even if you worked 10 years in a specific language, you will still get those "Aha! This is the way it's working!"-effects, though not that often as in the beginning.

The point you need to learn here is that toString() is not returning the representation you want/expect, but any the implementer has chosen. The default implementation of toString() is like this (javadoc):

Returns a string representation of the object. In general, the toString method returns a string that "textually represents" this object. The result should be a concise but informative representation that is easy for a person to read. It is recommended that all subclasses override this method.

The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `@', and the unsigned hexadecimal representation of the hash code of the object. In other words, this method returns a string equal to the value of:

getClass().getName() + '@' + Integer.toHexString(hashCode())

MicSim
+2  A: 

You have actually successfully digested the message. You just don't know how to present the found digest value properly. What you have is a byte array. That's a bit difficult to read, and a toString of a byte array yields [B@somewhere which is not useful at all.

The BigInteger comes into it as a tool to format the byte array to a single number.

What you do is:

  • construct a BigInteger with the proper value (in this case that value happens to be encoded in the form of a byte array - your digest
  • Instruct the BigInteger object to return a String representation (e.g. plain, readable text) of that number, base 16 (e.g. hex)

And the while loop prefixes that value with 0-characters to get a width of 32. I'd probably use String.format for that, but whatever floats your boat :)

extraneon
Very cool ... thank you. I added the following lines and it spit out what I was looking for:BigInteger newBI = new BigInteger(newBA);String outupt = newBI.toString(16);
Brian
@Brian: this is also the wrong solution. It will return a negative(!!!) hexstring when the leading byte is negative. See my answer for the correct hexstring conversion approach. You'll also see that some users might suggest to use `new BigInteger(bytes).abs().toString(16)` instead, but this is also fundamentally wrong. With a negative leading byte, this results in a wrong hexstring which is not convertible back to the *same* bytes. And thus cannot be cross-shared/used with another hexstring-producing MD5 generators the world is aware of.
BalusC
I'm not sure which part I have wrong. I constructed a BigInteger with "BigInteger newBI = new BigInteger(newBA);" and then I used toString(16) to return a human-readable, hexidecimal string. Where did I goof?
Brian
BalusC
The constructor I point to has a sign parameter, it's BigInteger(int sign, byte[] value). If you call it like new BigInteger(1, value) you should be OK I think.
extraneon
Indeed ... without the '1' signum, the hash of "-123" is "-35fbb7c73bebac9b46e0afbe17b90b7d", but with it, the hash is "ca044838c4145364b91f5041e846f483". Again, thank you for your help and input.
Brian
A: 

I'm also a newbie to development. For the current problem, I suggest the Book "Introduction To Cryptography With Java Applets" by David Bishop. It demonstrates what you need and so forth...

venJava
A: 

Any advice on how I can get better without having to "cheat" by Googling how to do something all the time?

By by not starting out with an MD5 hasher! Seriously, work your way up little by little on programs that you can complete without worrying about domain-specific stuff like MD5.

If you're dumping everything into main, you're not programming Java.

In a program of this scale, your main() should do one thing: create an MD5Hasher object and then call some methods on it. You should have a constructor that takes an initial string, a method to "do the work" (update, digest), and a method to print the result.

Get some tutorials and spend time on simple, traditional exercises (a Fibonacci generator, a program to solve some logic puzzle), so you understand the language basics before bothering with the libraries, which is what you are struggling with now. Then you can start doing useful stuff.

Paul Richter
A: 

How is a person supposed to know which way to go in Java? I have a background in C so this Java thing seems pretty weird. Any advice on how I can get better without having to "cheat" by Googling how to do something all the time?

Obvious answers are 1- google when you have questions (and it's not considered cheating imo) and 2- read books on the subject matter.

Apart from these two, I would recommend trying to find a mentor for yourself. If you do not have experienced Java developers at work, then try to join a local Java developer user group. You can find more experienced developers there and perhaps pick their brains to get answers to your questions.

CoolBeans
A: 

Thanks to all who read and replied. I guess I was hoping for a nifty Java "secret" that would keep me from having to jump to Google all the time.

Have a great day.

Brian