tags:

views:

108

answers:

7

So im just playing around with PHP and the MD5 functionality, sorry if this sounds really silly, but I cant seem to understand, how is it possible to represent an unlimited number of characters of input into a 32 bit character output? Is my logic sound here? Or is there a limit to the input that a MD5 function can take?

Thanks...

+3  A: 

A md5 is not representing the whole content : it's only... well, how to say that using non-technical terms ? Let's say a md5 is some kind of short-summary of your content.

A given content will always get you the same md5 ; and a single bit of difference in the content will almost always get you a very different md5 -- so md5 (or other hashing algorithms) is often used as a way to check that a file has not been corrupted (during a transfer, for example).

But, if you have a md5, there is no way to get the content back : you cannot re-generate a content from its summary.

Pascal MARTIN
It's not a summary. A summary is meant to provide information about the content, a hash is meant not to.
Matthew Flaschen
Yeah, "summary" might be too much of a ... shortcut -- but I hope it'll help understand the idea.
Pascal MARTIN
+4  A: 

It's not. Like all hash functions, there are collisions, but they're supposed to be unpredictable and useless to attackers. However, MD5 is throughly compromised. A group successfully used a MD5 collision to create a unapproved certificate authority. Someone will note that there have been no preimage attacks in the wild, but I think it's time to bail on MD5.

Matthew Flaschen
+1  A: 

I think you may be confusing an MD5 'hash' with compression or encryption.

A hash code is just a product of a process that goes through data and generates data that is likely to be unique for the given input. MD5 hashes don't contain all the data, just a probably unique representation of a 'thumbprint' of the data.

John Weldon
if my inputting data is less than 32 bit, would then my md5 hash be always the "true" representation of my input and not a "thumbprint"?
chicane
in short: no. a hash is never a 'summary' or a 'compressed image'.
The MYYN
no. It's still a hash function not a compression/encryption function
John Weldon
+1  A: 
  • Analogy: Fingerprints.

  • How is it possible? Hash functions in general rely on the presence of certain properties ...

  • Is there a limit? Learn about md5 collision ...

The MYYN
A: 

MD5 does not have the purpose of being unique, rather it can tell you if a certain bit stream (file for example) was not corrupted either by transmission or on purpose. It is very unlikely that someone wanting to change a file in any way will be able to come up with the same MD5 value, so that's why it is used by download sites to make sure you are getting the correct file.

Otávio Décio
+1  A: 

It's possible to have a Collision on any hashing algorithm. You simply can't represent all of the information in the amount of space that it uses. Otherwise we'd all be using hashing algorithms instead of compression algorithms.

The chances of hitting a collision are very small. For things like passwords, they contents are usually very small. Collisions with the same hash will likely be much larger, as well as gibberish. With an ISO, the collision file might not even be bootable. An archive file probably will be unextractable.

MD5 has several ways for people to find collisions for a given hash. I'm sure other hashing algorithms do too. I believe md5 has some collision problems where you can change a small amount with no hash-change, which is why a lot of people don't recommend using it.

Some places also store the file-length (or content-length). That helps a bit with preventing collision attacks.

AlReece45
A: 

Its not only possible but is an unavoidable fact that there many messages which will result in the same hash. These are usually refered to as collisions. But its very VERY hard to find them. A hash is just a function which generates a result which is effectively impossible to predict without knowing the original input.

Note that while some people (even trying to answer you're question) think that md5 is insecure the reality is that it is still more than adequate for most purposes, although I'd recommend one of the more recent hashes if you run paypal or control the launchpads for a fleet of nuclear weapons.

(and before anyone starts flaming me with silly replies, tell me what I hashed to get: b958cf404456ceb1302015102ec57a64 )

C.

symcbean
It's not hard to find collisions, it can be done in a few hours (http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/). It is hard to find preimages, but that's a non-sequitur.
Matthew Flaschen
@Matthew Flaschen, yes, I'll settle for a collision for my hash.
symcbean
@symcbean, a collision attack means finding two different preimages that hash to the same value. The hash value isn't pre-specified. You're asking people to find a preimage for a pre-specified hash value. That's a first preimage attack.
Matthew Flaschen
Yawn, -1, but nobody has managed to 'break' my md5 hash yet.
symcbean