views:

72

answers:

3

Here's 3 example md5 hashes

$ md5 -s "1" && md5 -s "2" && md5 -s "3"
MD5 ("1") = c4ca4238a0b923820dcc509a6f75849b
MD5 ("2") = c81e728d9d4c2f636f067f89cc14862c
MD5 ("3") = eccbc87e4b5ce2fe28308fd9f2a7baf3

Say I wanted to take 8 characters from any hash. Is the beginning part of the hash particularly more "random" than the end? middle? Or are all substrings equally "random"?

+6  A: 

All substrings of a good hash (and md5 is reasonably good despite being cryptographically unsafe) are equally random, so yes, take any bits you like from the string, they should be equally distributed.

Gintautas Miliauskas
+2  A: 

I was curious myself, so I went ahead and wrote a program to test this. You'll need Crypto++ to compile the code.

Disclaimer: When it comes to cryptography, or even just mathematics in general, I know just enough to shoot myself in the foot. So, take the following results with a grain of salt and keep in mind that I only have a cursory knowledge of the tools I'm using.

I only sampled three substrings: the first 8 bytes, the middle 8 bytes, and the last 8 bytes. Long story short, they're equally random.

However, when using a smaller sample space, it appears as if the last 8 bits are slightly more random. The larger the sampling space, the closer all three substrings approach complete randomness.


1000 iterations:

First:  0.995914
Middle: 0.996546
Last:   0.998104

5000 iterations:

First:  0.998387
Middle: 0.998624
Last:   0.999501

10000 iterations:

First:  0.999614
Middle: 0.999457
Last:   1

30000 iterations:

First:  1
Middle: 1
Last:   1

"Randomness" is measured by Crypto++'s MaurerRandomnessTest class. For reference, the executable compiled from the above code has a randomness value of 0.632411 and a copy of Shakespeare's Macbeth downloaded from Project Gutenburg has a randomness value of 0.566991.

kurige
I'm marking this accepted as it actually demonstrates the "randomness." Thanks @kurige!
macek
+2  A: 

Nitpick: "random" is the wrong word to use here, since hash functions are deterministic.

As for answering what you mean :), a desirable property of hash functions is achieving the Avalanche effect: basically, to have every bit of input cause drastic changes to the output. So, for a well-designed hash, every substring should be affected equally often ("be as random") as any other.

snemarch
@snemarch, I put the word random in quotes for this very reason :) +1 for linking to the avalanche effect.
macek