views:

2438

answers:

10

Is there a fixed point in the MD5 transformation, i.e. does there exist x such that md5(x) == x?

A: 

Probably, but finding it would take longer than we have or would involve compromising MD5.

Glomek
Well, it's not like that hasn't been done already!
It hasn't been broken. All they've been able to do is, in a reasonable amount of time produce 2 strings that equate the the same hash. It's still very hard to produce a string which will equate to a specific hash.
Kibbee
not sure how finding one would compromise md5, anymore than it would compromise the algorithm if I told you MD5("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6
Kip
A fixed point would probably give some leverage on the maths that could lead to a more comprehensive breach of MD5. I'm not convinced that Glomek can really justify 'probably'; I would accept 'possibly' without equivocation.
Jonathan Leffler
+7  A: 

Since the hash is irreversible, this would be very hard to figure out. The only way to solve this, would be to calculate the hash on every possible output of the hash, and see if you came up with a match.

To elaborate, there are 16 bytes in an MD5 hash. That means there are 2^(16*8) = 3.4 * 10 ^ 38 combinations. If it took 1 millisecond to compute a hash on a 16 byte value, it would take 10790283070806014188970529154.99 years to calculate all those hashes.

Kibbee
True, if you had to try *every one*. But you would only have to try every possible input to verify that there was no fixed point. If there is a fixed point (and Adam Rosenfield's answer suggests that there might be) then one lucky guess is all that's needed.
Naaff
The function is irreversible in the sense that it has no mathematical inverse, however this only means that for a given output there may be more than one input. In general, the space of inputs for a given output would be infinite, but if you know it started as a 128-bit value you can narrow down the possibilities. There is a chance for "working backwards" if you don't treat the function as a black box, but instead read the spec and apply some mathematical thinking.
rndmcnlly
@Naaff: "only have to try every possible input" - and this is easier than to try every hash, how? Quite the opposite, since several possible inputs may hash into the same output.
Piskvor
@Piskvor: You misunderstood what Naaff meant (it took me a minute, too). A clearer way to say it would be "Only if there is no fixed point will you have try try every possible input (from the space 2^128)". In other words, you only have to try every possibility if none before that works. So 1.08e28 years, or one lucky guess!
P Daddy
+35  A: 

Since an MD5 sum is 128 bits long, any fixed point would necessarily also have to be 128 bits long. Assuming that the MD5 sum of any string is uniformly distributed over all possible sums, then the probability that any given 128-bit string is a fixed point is 1/2^128.

Thus, the probability that no 128-bit string is a fixed point is (1 - 1/2^128)^(2^128), so the probability that there is a fixed point is 1 - (1 - 1/2^128)^(2^128).

Since the limit as n goes to infinity of (1 - 1/n)^n is 1/e, and 2^128 is most certainly a very large number, this probability is almost exactly 1 - 1/e = 63.21%.

Of course, there is no randomness actually involved - there either is a fixed point or there isn't. But, we can be 63.21% confident that there is a fixed point. (Also, notice that this number does not depend on the size of the keyspace - if MD5 sums were 32 bits or 1024 bits, the answer would be the same, so long as it's larger than about 4 or 5 bits).

Adam Rosenfield
Can you actually make the assumption that the MD5 sum of any string is uniformly distributed over all possible sums?
Ori Pessach
Yes. Large numbers and modulous form a roughly random distribution. If they didn't, you would have constant collisions. The nature of md5 forces its output to be distributed randomly.
Stefan Kendall
A: 

Strictly speaking, since the input of MD5 is 512 bits long and the output is 128 bits, I would say that's impossible by definition.

Ori Pessach
No, the MD5 of 1 byte strings exists.
Joshua
The input can be any size. If the input is less than 512 bytes then it is padded, but small inputs are still acceptable. From Wikipedia: "MD5 processes a variable-length message into a fixed-length output of 128 bits. The input message is broken up into chunks of 512-bit blocks (sixteen 32-bit little endian integers); the message is padded so that its length is divisible by 512."
Naaff
So you're assuming that, say, 0000000001 = 1 ? I would argue then that the question is poorly specified, at best.
Ori Pessach
The *input* to MD5 can be 128 bits. If MD5 wants to pad that input then, well, frankly, that's MD5's business. The input is still well defined. Likewise, the output is a well-defined 128 bits. If the (well-defined) input and the (well-defined) output are both the same then MD5(x)=x.
Naaff
But the padding is not well defined in the context of the transformation itself. Who says you can't pad with all 1s, or a randomly chosen, repeating bit pattern? Well, the RFC says that, but only in the context of extending a function from 512 bit numbers to 128 bit numbers to produce 128 bit numbers for arbitrarily long (or short) inputs. It only serves to further illustrate how poorly specified the question is.
Ori Pessach
@Joshua the MD5 of an empty string (i.e. 0 bytes) even exists
Kip
A: 

There are two interpretations, and if one is allowed to pick either, the probability of finding a fixed point increases to 81.5%.

  • Interpretation 1: does the MD5 of a MD5 output in binary match its input?
  • Interpretation 2: does the MD5 of a MD5 output in hex match its input?
Joshua
There's nothing about the MD5 algorithm that implies hex - it operates on bytes, and produces bytes - so I think the latter interpretation is invalid.
Nick Johnson
Whether or not there is a fixed point under interpretation 1, there could still be (or not be) one under interpretation 2. However, if you are interested in exploring the problem, interpretation 1 seems like a much better place to start because you won't have to make all sorts of arbitrary decisions about casing and character encoding. On top of that, the binary case has fewer bits!
rndmcnlly
You are misinterpreting what the hex really is. You can represent binary in hex, just as you can represent it in decimal or octal or base 3. It's a number, and has different representations. So, interpretation 1 and 2 are the same thing. What you're thinking of is the character string representation, which is not the same hex at all but is an entirely different binary value. In fact you could have many different hex strings in different character sets. The 128-bit hash value can be represented as a "hex" string, but it is not equal to the string. The string is not the same binary data.
Dustin Fineout
Dustin, interpretation 2 really does mean the MD5 of the display string.
Joshua
There is a huge problem with that idea though, in that it is directly dependent on your character encoding. Different encoding schema are going to result in entirely different result sets. There's even an entire project and an article debunking it based on this misunderstanding of how MD5 operates http://acodingfool.typepad.com/blog/2009/05/the-kembler-identity.html
Dustin Fineout
+1  A: 

While I don't have a yes/no answer, my guess is "yes" and furthermore that there are maybe 2^32 such fixed points (for the bit-string interpretation, not the character-string intepretation). I'm actively working on this because it seems like an awesome, concise puzzle that will require a lot of creativity (if you don't settle for brute force search right away).

My approach is the following: treat it as a math problem. We have 128 boolean variables, and 128 equations describing the outputs in terms of the inputs (which are supposed to match). By plugging in all of the constants from the tables in the algorithm and the padding bits, my hope is that the equations can be greatly simplified to yield an algorithm optimized to the 128-bit input case. These simplified equations can then be programmed in some nice language for efficient search, or treated abstractly again, assigning single bits at a time, watching out for contraditions. You only need to see a few bits of the output to know that it is not matching the input!

rndmcnlly
+3  A: 

This is a really interesting question, and a number of developers have contributed to test programs together to figure this out, all headed up by the awesome Elliott Kember. Check it out at "The Kember Identity" for more.

Jamie Rumbelow
+2  A: 

According to http://ask.metafilter.com/50343/MD5-and-the-probability-of-collisions :

Slim. An md5 hash is 128 bits long, so assuming that all hashes have an equal chance of occuring, the odds of any two random strings hashing to the same value are 1 in 2^128.

2 to the power of 128 is a pretty big number.

Colen
Thats not what he is asking. I think he is asking what are the chance the md5 hash will be the same as what was hashed.
Iznogood
1 in 340282366920938463463374607431768211456 basically :P
Ninefingers
@Iznogood The probabily of `md5(x) == x` is essentially identical to the probability that `md5(x) == y` for any `y`... otherwise md5 wouldn't be a very good cryptographic hash.
Tyler McHenry
@Hey Just explaining what kyle is asking.
Iznogood
And you're right -- Colen's quote isn't precisely on point. But the value of 2^128 is the correct answer.
Tyler McHenry
How big a number ... http://stackoverflow.com/questions/1705008/simple-proof-that-guid-is-not-unique (for comparison's sake :-) )
Sean Vieira
+1  A: 

Someone else is looking for such a value also. See http://elliottkember.com/kember_identity.html. If the math one the linked page is correct, there's a 67% chance that such a value exists, but no one has found one yet, and a brute force search is computationally infeasable.

Jason Jenkins
That's not quite the same question either. That's asking what the probability is that there exists an `x` such that `md5(x)==x`, not what the probability is that, given `x`, `md5(x)==x`.
Tyler McHenry
A: 

Other people wondered this as well, and both this thread and this article give the same 0.37 chances that there is NO string that will hash to itself using md5. Which in reverse means there is actually a chance that a string would eventually md5 to itself.

FreekOne