Is there a fixed point in the MD5 transformation, i.e. does there exist x such that md5(x) == x
?
views:
2438answers:
10Probably, but finding it would take longer than we have or would involve compromising MD5.
Since the hash is irreversible, this would be very hard to figure out. The only way to solve this, would be to calculate the hash on every possible output of the hash, and see if you came up with a match.
To elaborate, there are 16 bytes in an MD5 hash. That means there are 2^(16*8) = 3.4 * 10 ^ 38 combinations. If it took 1 millisecond to compute a hash on a 16 byte value, it would take 10790283070806014188970529154.99 years to calculate all those hashes.
Since an MD5 sum is 128 bits long, any fixed point would necessarily also have to be 128 bits long. Assuming that the MD5 sum of any string is uniformly distributed over all possible sums, then the probability that any given 128-bit string is a fixed point is 1/2^128.
Thus, the probability that no 128-bit string is a fixed point is (1 - 1/2^128)^(2^128), so the probability that there is a fixed point is 1 - (1 - 1/2^128)^(2^128).
Since the limit as n goes to infinity of (1 - 1/n)^n is 1/e, and 2^128 is most certainly a very large number, this probability is almost exactly 1 - 1/e = 63.21%.
Of course, there is no randomness actually involved - there either is a fixed point or there isn't. But, we can be 63.21% confident that there is a fixed point. (Also, notice that this number does not depend on the size of the keyspace - if MD5 sums were 32 bits or 1024 bits, the answer would be the same, so long as it's larger than about 4 or 5 bits).
Strictly speaking, since the input of MD5 is 512 bits long and the output is 128 bits, I would say that's impossible by definition.
There are two interpretations, and if one is allowed to pick either, the probability of finding a fixed point increases to 81.5%.
- Interpretation 1: does the MD5 of a MD5 output in binary match its input?
- Interpretation 2: does the MD5 of a MD5 output in hex match its input?
While I don't have a yes/no answer, my guess is "yes" and furthermore that there are maybe 2^32 such fixed points (for the bit-string interpretation, not the character-string intepretation). I'm actively working on this because it seems like an awesome, concise puzzle that will require a lot of creativity (if you don't settle for brute force search right away).
My approach is the following: treat it as a math problem. We have 128 boolean variables, and 128 equations describing the outputs in terms of the inputs (which are supposed to match). By plugging in all of the constants from the tables in the algorithm and the padding bits, my hope is that the equations can be greatly simplified to yield an algorithm optimized to the 128-bit input case. These simplified equations can then be programmed in some nice language for efficient search, or treated abstractly again, assigning single bits at a time, watching out for contraditions. You only need to see a few bits of the output to know that it is not matching the input!
This is a really interesting question, and a number of developers have contributed to test programs together to figure this out, all headed up by the awesome Elliott Kember. Check it out at "The Kember Identity" for more.
According to http://ask.metafilter.com/50343/MD5-and-the-probability-of-collisions :
Slim. An md5 hash is 128 bits long, so assuming that all hashes have an equal chance of occuring, the odds of any two random strings hashing to the same value are 1 in 2^128.
2 to the power of 128 is a pretty big number.
Someone else is looking for such a value also. See http://elliottkember.com/kember_identity.html. If the math one the linked page is correct, there's a 67% chance that such a value exists, but no one has found one yet, and a brute force search is computationally infeasable.
Other people wondered this as well, and both this thread and this article give the same 0.37 chances that there is NO string that will hash to itself using md5. Which in reverse means there is actually a chance that a string would eventually md5 to itself.