views:

403

answers:

4

I have this pseudo-code in java:

bytes[] hash = MD5.hash("example");

String hexString = toHexString(hash); //This returns something like a0394dbe93f

bytes[] hexBytes = hexString.getBytes("UTF-8");

Now, hexBytes[] and hash[] are different.

I know I'm doing something wrong since hash.length() is 16 and hexBytes.length() is 32. Maybe it has something to do with java using Unicode for chars (just a wild guess here).

Anyways, the question would be: how to get the original hash[] array from the hexString.

The whole code is here if you want to look at it (it's ~ 40 LOC) http://gist.github.com/434466

The output of that code is:

16
[-24, 32, -69, 74, -70, 90, -41, 76, 90, 111, -15, -84, -95, 102, 65, -10]
32
[101, 56, 50, 48, 98, 98, 52, 97, 98, 97, 53, 97, 100, 55, 52, 99, 53, 97, 54, 102, 102, 49, 97, 99, 97, 49, 54, 54, 52, 49, 102, 54]

Thanks a lot!

+2  A: 

You haven't shown toHexString, but basically you need the reverse equivalent - look for a method called fromHexString or something similar.

Basically String.getBytes() performs a normal encoding (in this case in UTF-8). You want to effectively decode the text - which is a textual representation of arbitrary binary data - into a byte[].

Apache Commons Codec has appropriate methods - the API isn't ideal, but it would work:

byte[] data = ...;
String hex = Hex.encodeHexString(data);
...

data[] decoded = (byte[]) Hex.decode(hex);
Jon Skeet
@Jon, the code is in the gist I linked http://gist.github.com/434466 (names vary though). Thanks, I'll look into Apache Commons
Pablo Fernandez
Just curious... why do you say that the API isn't ideal?
Pablo Fernandez
@Pablo: Ideally there should be a Hex.decode method taking a String and returning a byte array, strongly typed. The `Object decode(Object)` signature is annoying.
Jon Skeet
Oh, I just downloaded the last version and takes a char array and returns a byte array.
Pablo Fernandez
@Pablo: There's an overload that does that, yes - but it's annoying to have to convert to a char array just for this.
Jon Skeet
+2  A: 

You are just getting the bytes of the hex string with hexString.getBytes("UTF-8"); , not converting the hex digits to their byte values.

That is, you need to write the reverse of your toHexString function. Your toHexString should probably make sure to format values below 10 to 2 digits, so e.g. the byte 9 ends up as "09" and not "9".

nos
+1  A: 

getBytes() doesn't parse hexadecimal characters, it processes character encodings. In other words, it doesn't turn '0A' into 0x0A, but into 0x30 0x41, because that's how the characters '0' and 'A' are encoded. You want Integer.parseInt(String, radix) instead in your function, with radix==16.

Kilian Foth
A: 

If you don't want use a library, here is how you can do it with my version of the hex decoder,

byte[] hexBytes = dehexify(hexString);

public static byte[] dehexify(String hexString) {
    if (hexString.length()%2 == 1)
        throw new IllegalArgumentException("Invalid length");       
    int len = hexString.length()/2;
    byte[] bytes = new byte[len];
    for (int i=0; i<len; i++) {
        int index = i*2;
        bytes[i] = (byte)Integer.parseInt(hexString.substring(index, index+2), 16);
    }
    return bytes;
}
ZZ Coder