views:

150

answers:

4

I recently have a problem with a crypto library which produces bad md5 output. Instead of 32 digits it returns 30.

As we don't use unit test, this problem was quite a headache to solve... because we assumed that md5 string was correct and look for bugs in other places.

That make me realize of the real value of unit tests (unit test first, tdd later).

But I'm not sure how to test cryptographic methods well enough. How do you get proper expected values?

EDIT: Thanks for the answers, I think I didn't explain it enough.

The problem was with a third party tool wich produces bad md5 output. Then, how do you get that assert value? I know it mustn't change, just I don't get how to obtain it from a reliable source.

+3  A: 

An encoded MD5 will always have the same value. So you can do an assertion with encoding a string and the value you know is appropriate.

assert_equals encode("str"), "341be97d9aff90c9978347f66f945b77"

The encoded value of "str" should always be "341be97d9aff90c9978347f66f945b77".
If your encoding returns that value, it works well. Otherwise, there's a problem.

Damien MATHIEU
+1  A: 

The basic premise of unit testing is to run a method with some data where you know in advance what the output will be.

So to test an encryption method you need to generate some matching pairs of input and output data. Take a data string, say "This is some test data". Encrypt it using a 3rd party encryption tool or library, to get "Guvf vf fbzr grfg qngn".

Now you have a pair of input data with it's expected output.

Write your unit test to pass in the input data, and validate that the output matches your predetermined expectation. Your input and expected output data can be hard coded into the unit test as strings (or read from a database if you want to do lots of pairs).

Contrary to general programming best practises, it's normally considered good practise to only run unit tests with predetermined, planned and repeatable data. Running unit tests with randomly generated strings is considered bad practise because it means your unit tests aren't repeatable.

Obviously, the theory is the same for an MD5 method, just take some sample data, run it through a 3rd party MD5 hash tool and then use the input/output data pairs to validate that your method gives the correct output.

Simon P Stevens
+5  A: 

Known correct data for cryptographic algorithms is usually called test vectors. So google "MD5 test vectors" to get a ton of good input data for your tests.

The most authoritative resource for test vectors is of course the document defining the algorithm. Most standards documents will include a set of test vectors. For instance, RFC 1321 contains the following set of test data:

MD5 ("") = d41d8cd98f00b204e9800998ecf8427e
MD5 ("a") = 0cc175b9c0f1b6a831c399e269772661
MD5 ("abc") = 900150983cd24fb0d6963f7d28e17f72
MD5 ("message digest") = f96b697d7cb7938d525a2f31aaf161d0
MD5 ("abcdefghijklmnopqrstuvwxyz") = c3fcd3d76192e4007dfb496cca67e13b
MD5 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") =
d174ab98d277d9f5a5611c2c9f419d9f
MD5 ("123456789012345678901234567890123456789012345678901234567890123456
78901234567890") = 57edf4a22be3c955ac49da2e2107b67a
Rasmus Faber
A: 

As others already mentioned, either published test vectors or published reference implementation should be a good source of reliable test values.

One more thing I would like to add: if at all possible please don't use MD5. There are so many known security issues with it already that maintaining compatibility with some old system is the only reason one would like to use it in a new code.

If you can, use SHA-256 (defined in FIPS-180-2, including a set of test vectors). If you don't need 256 bits of the hash, truncate it to 128, you will have a much more secure alternative for MD5.

Krystian