tags:

views:

178

answers:

2

I want to truncate an md5 hash to about half size. How much does that increase the odds of collisions? if I'm dealing with around 500 000 generations, should I be worried about a collision? what about 1m generations.

A: 

There's an interesting mathematical problem called the birthday problem that deals with that kind of situation. The fact is that the more entries you push in, the higher the chances to have a collision.

Following the table posted on the above link, assuming your digests are 64 bits each (since a single MD5 hash is 128 bits) and that MD5 have a uniform distribution, there is a very low chance that two hashes will collide. It becomes significant (1% chance or more) at 610,000,000 entries.

zneak
+1  A: 

The math you're looking for is on Wikipedia's birthday attack page.

We consider the following experiment. From a set of H values we choose n values uniformly at random thereby allowing repetitions. Let p(n; H) be the probability that during this experiment at least one value is chosen more than once. This probability can be approximated as

p(n;H) ~= 1-e^(-n^2/(2H))

With 128 bits the chance of a collision among 500,000 hash values is around 10-28. If you halve the size of the collision space then the chance of collision is around 10-9. That is, even though the chance is vastly greater it's still very, very low. It depends on how critical it is that there be no collisions. 10-9 is on the order of one in a billion, so while extremely unlikely it's within the realm of possibility.

For reference:

1028 = 10 octillion = 10 billion billion billion
109 = 1 billion

John Kugelman
someone once said "once in a million is next tuesday" :)
Stefano Borini
10^-9 would seriously worry me. MD5 in general has a worrisome number of collisions not to mention all the rainbow tables around these days.
Xorlev