views:

427

answers:

3

I'm implementing an interface for digital payment service called Suomen Verkkomaksut. The information about the payment is sent to them via HTML form. To ensure that no one messes with the information during the transfer a MD5 hash is calculated at both ends with a special key that is not sent to them.

My problem is that for some reason they seem to decide that the incoming data is encoded with ISO-8859-1 and not UTF-8. The hash that I sent to them is calculated with UTF-8 strings so it differs from the hash that they calculate.

I tried this with following code:

String prehash = "6pKF4jkv97zmqBJ3ZL8gUw5DfT2NMQ|13466|123456||Testitilaus|EUR|http://www.esimerkki.fi/success|http://www.esimerkki.fi/cancel|http://www.esimerkki.fi/notify|5.1|fi_FI|0412345678|0412345678|[email protected]|Matti|Meikäläinen||Testikatu 1|40500|Jyväskylä|FI|1|2|Tuote #101|101|1|10.00|22.00|0|1|Tuote #202|202|2|8.50|22.00|0|1";
String prehashIso = new String(prehash.getBytes("ISO-8859-1"), "ISO-8859-1");

String hash = Crypt.md5sum(prehash).toUpperCase(); 
String hashIso = Crypt.md5sum(prehashIso).toUpperCase();

Unfortunately both hashes are identical with value C83CF67455AF10913D54252737F30E21. The correct value for this example case is 975816A41B9EB79B18B3B4526569640E according to Suomen Verkkomaksut's documentation.

Is there a way to calculate MD5 hash in Java with ISO-8859-1 strings?

UPDATE: While waiting answer from Suomen Verkkomaksut, I found an alternative way to make the hash. Michael Borgwardt corrected my understanding of String and encodings and I looked for a way to make the hash from byte[].

Apache Commons is an excellent source of libraries and I found their DigestUtils class which has a md5hex function which takes byte[] input and returns a 32 character hex string.

For some reason this still doesn't work. Both of these return the same value:

DigestUtils.md5Hex(prehash.getBytes());
DigestUtils.md5Hex(prehash.getBytes("ISO-8859-1"));
+1  A: 

If you send UTF-8 encoded data that they treat as ISO-8859-1 then that could be the source of your problem. I suggest you either send the data in ISO-8859-1 or try to communicate to Suomen Verkkomaksut that you're sending UTF-8. In a http-based protocol you do this by adding charset=utf-8 to Content-Type in the HTTP header.

A way to rule out some issues would be to try a prehash String that only contains characters that are encoded the same in UTF-8 and ISO-8859-1. From what I can see you can achieve this by removing all "ä" characters in the string you'e used.

Buhb
I already have both <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> and <?xml version="1.0" encoding="UTF-8"?> on the page. Unfortunately these don't seem to help. But you're right, maybe I should just contact them.
Ville Salonen
+8  A: 

You seem to misunderstand how string encoding works, and your Crypt class's API is suspect.

Strings don't really "have an encoding" - an encoding is what you use to convert between Strings and bytes.

Java Strings are internally stored as UTF-16, but that does not really matter, as MD5 works on bytes, not Strings. Your Crypt.md5sum() method has to convert the Strings it's passed to bytes first - what encoding does it use to do that? That's probably the source of your problem.

Your example code is pretty nonsensical as the only effect this line has:

String prehashIso = new String(prehash.getBytes("ISO-8859-1"), "ISO-8859-1");

is to replace characters that cannot be represented in ISO-8859-1 with question marks.

Michael Borgwardt
Thanks for the clarification.
Ville Salonen
+1 on the suspicious-ness of the `Crypt` class. It also suggests there may be a confusion between encryption and cryptographic hashing (but there may as well not be one, depending on the rest of the class).
Romain
+2  A: 

Java has a standard java.security.MessageDigest class, for calculating different hashes.

Here is the sample code

include java.security.MessageDigest;

// Exception handling not shown

String prehash = ...

final byte[] prehashBytes= prehash.getBytes( "iso-8859-1" );

System.out.println( prehash.length( ) );
System.out.println( prehashBytes.length );

final MessageDigest digester = MessageDigest.getInstance( "MD5" );

digester.update( prehashBytes );

final byte[] digest = digester.digest( );

final StringBuffer hexString = new StringBuffer();

for ( final byte b : digest ) {
    final int intByte = 0xFF & b;

    if ( intByte < 10 )
    {
        hexString.append( "0" );
    }

    hexString.append(
        Integer.toHexString( intByte )
    );
}

System.out.println( hexString.toString( ).toUpperCase( ) );

Unfortunately for you it produces the same "C83CF67455AF10913D54252737F30E21" hash. So, I guess your Crypto class is exonerated. I specifically added the prehash and prehashBytes length printouts to verify that indeed 'ISO-8859-1' is used. In this case both are 328.

When I did presash.getBytes( "utf-8" ) it produced "9CC2E0D1D41E67BE9C2AB4AABDB6FD3" (and the length of the byte array became 332). Again, not the result you are looking for.

So, I guess Suomen Verkkomaksut does some massaging of the prehash string that they did not document, or you have overlooked.

Alexander Pogrebnyak
Your hash function doesn't pad with zero if byte is less than 10.
BalusC
Ah well, maybe I'll just have to wait for an answer from them. Thanks for the provided code example.
Ville Salonen
@BalusC. You are quite right. I've corrected my example. Always beats me why Java does not have Byte.toHexString and Byte.toUpperHexString that does the correct thing.
Alexander Pogrebnyak
Simply use the Hex class of apache commons codec which does exactly that. I had to rebuild a HUGE amount of hashes because i used my own, and broken, implementation for byte[] to String conversion.
Malax