views:

3710

answers:

13

Hey everyone,

One concept I've always wondered about is the use of cryptographic hash functions and values. I understand that these functions can generate a hash value that is unique and virtually impossible to reverse, but here's what I've always wondered:

If on my server, in PHP I produce: md5("stackoverflow.com") = "d0cc85b26f2ceb8714b978e07def4f6e"

When you run that same string through an MD5 function, you get the same result on your PHP installation. A process is being used to produce some value, from some starting value.

Doesn't this mean that there is some way to deconstruct what is happening and reverse the hash value?

What is it about these functions that makes the resulting strings impossible to retrace?

Cheers,

+80  A: 

The input material can be an infinite length, where the output is always 128 bits long. This means that an infinite number of input strings will generate the same output.

If you pick a random number and divide it by 2 but only write down the remainder, you'll get either a 0 or 1 -- even or odd, respectively. Is it possible to take that 0 or 1 and get the original number?

Cody Brocious
That is to say, neither number --> remainder nor string --> md5 are "injective functions".
Federico Ramponi
This is a fantastic response, thank you. Simple, concise, and the concept is clearer now.
barfoon
Federico, surely you mean that neither are bijective functions? They are both injective.
Mihai Limbășan
moocha: Injective means 1 to 1. The MD5 is certainly not 1 to 1, as the domain is larger than the range.Another point worth noting is that given a MD5 checksum, it's very hard to find even one string that hashes to it. Might be worth adding to the answer for clarification.
biozinc
Ugh, I should really brush up on formal math again - and should not profess opinions where I'm not sure... Sorry, Federico, thanks, biozinc.
Mihai Limbășan
A exemplary answer, I must say. +1
Adeel Ansari
You state that "an infinite number of strings will generate the same output." Isn't that contradictory to the statement in the question that these types of functions "generate a hash value that is unique." In your random number example produces a value that is almost never unique. I'm confused.
vg1890
It's impossible to have a hash function that generates unique values. You're mapping an infinite number of values into a finite number of values, which guarantees collisions.
Cody Brocious
I'd suggest that your answer does not address the key point. As biozinc mentioned, what's important for a secure password hash is that you can't find any input that creates the output, not that you can't find the original input. On that note, MD5 is not necessarily as secure as it could be (http://en.wikipedia.org/wiki/MD5#Collision_vulnerabilities).
Mike Pelley
+15  A: 

Cody Brocious's answer is the right one. Strictly speaking, you cannot "invert" a hash function because many strings are mapped to the same hash. Notice, however, that either finding one string that gets mapped to a given hash, or finding two strings that get mapped to the same hash (i.e. a collision), would be major breakthroughs for a cryptanalyst. The great difficulty of both these problems is the reason why good hash functions are useful in cryptography.

Federico Ramponi
+6  A: 

MD5 does not create a unique hash value; the goal of MD5 is to quickly produce a value that changes significantly based on a minor change to the source.

E.g.,

"hello" -> "1ab53"
"Hello" -> "993LB"
"ZR#!RELSIEKF" -> "1ab53"

(Obviously that's not actual MD5 encryption)

Most hashes (if not all) are also non-unique; rather, they're unique enough, so a collision is highly improbable, but still possible.

Trevel
+2  A: 

But this is where rainbow tables come into play. Basically it is just a large amount of values hashed separetely and then the result is saved to disk. Then the reversing bit is "just" to do a lookup in a very large table.

Obviously this is only feasible for a subset of all possible input values but if you know the bounds of the input value it might be possible to compute it.

martinlund
Ahh yes. I enjoyed reading Jeff's post on Hash Tables (http://www.codinghorror.com/blog/archives/000949.html), and this thread has helped in the understanding of the concept.
barfoon
+4  A: 

A hash collision is much more likely than you would think. Take a look at the birthday paradox to get a greater understanding of why that is.

Gamic
There are 365 possible birthday values, which is between 2^8 and 2^9. A 128-bit hash has 2^128 possible values -- 2^120 times as many. Yes, collisions are more likely than you might intuit, but they are still astronomically unlikely.
Tim Keating
A: 

by definition Hash(cryptographic Hash) function :should not be invertible;should not have collisions(least possible).

regd your question : it is one way hash. input (irrespective of length) will generate a fixed size output.(it will be padded based on algo(512 bit boundary for MD5) ). The information is compressed(lost) and practically not possible to generate from reverse transforms.

additional info on MD5: it is vulnerable to collisions. gone through this article recently, http://www.win.tue.nl/hashclash/Nostradamus/

opens source code for crypto hash implementations(MD5 and SHA) can be found at Mozilla code. (freebl library).

FL4SOF
+18  A: 

If hash functions such as MD5 were reversible then it would have been a watershed event in the history of data compression algorithms! Its easy to see that if MD5 were reversible then arbitrary chunks of data of arbitrary size could be represented by a mere 128 bits without any loss of information. Thus you would be able to reconstruct the original message from a 128 bit number regardless of the size of the original message.

SDX2000
very nicely said. :)
Mohit Nanda
nicely said indeed.
hasen j
think how quick it would be to download linux distros if you could just get the md5 instead :)
Colin Pickard
@Colin Pickard: we wouldn't be *downloading* linux distros any more, we would be *writing them down*. :)
ΤΖΩΤΖΙΟΥ
+3  A: 

As the number of possible input files is larger than the number of 128-bit outputs, it's impossible to uniquely assign an MD5 hash to each possible.

Cryptographic hash functions are used for checking data integrity or digital signatures (the hash being signed for efficiency). Changing the original document should therefore mean the original hash doesn't match the altered document.

These criteria are sometimes used:

  1. Preimage resistance: for a given hash function and given hash, it should be difficult to find an input that has the given hash for that function.
  2. Second preimage resistance: for a given hash function and input, it should be difficult to find a second, different, input with the same hash.
  3. Collision resistance: for a given has function, it should be difficult to find two different inputs with the same hash.

These criterial are chosen to make it difficult to find a document that matches a given hash, otherwise it would be possible to forge documents by replacing the original with one that matched by hash. (Even if the replacement is gibberish, the mere replacement of the original may cause disruption.)

Number 3 implies number 2.

As for MD5 in particular, it has been shown to be flawed: http://www.infosec.sdu.edu.cn/paper/md5-attack.pdf">How to break MD5 and other hash functions

Unfortunate that paper is down.
James McMahon
"Unfortunately", sorry.
James McMahon
+1  A: 

As most have already said MD5 was designed for variable length data streams to be hashed to a fixed length chunk of data, so a single hash is shared by many input data streams.

However if you ever did need to find out the original data from the checksum, for example if you have the hash of a password and need to find out the original password, it's often quicker to just google (or whatever searcher you prefer) the hash for the answer than to brute force it. I have successfully found out a few passwords using this method.

Tim Matthews
+3  A: 

A good way to think of a hash algorithm is to think of resizing an image in Photoshop... say you have a image that is 5000x5000 pixels and you then resize it to just 32x32. What you have is still a representation of the original image but it is much much smaller and has effectively "thrown away" certain parts of the image data to make it fit in the smaller size. So if you were to resize that 32x32 image back up to 5000x5000 all you'd get is a blurry mess. However because a 32x32 image is not that large it would be theoretically conceivable that another image could be downsized to produce the exact same pixels!

That's just an analogy but it helps understand what a hash is doing.

NathanE
A: 

I'm probably wrong here, but surely during md5 encryption, a certain number of options would give the same single bit result, if you saved which option was used for that satge then surely reversal is possible?

For example

If i turn all 5's into either a 3 or a 2 then

55555 --> 32232

So if i haave another number saying which option each of those is: 12212

then with 12212 and 32232 i can get the result 55555

Sorry if that doesn't make any sense

That's backwards, actually. There's no randomness involved, so the same starting value always produces the same hash.
Tim Keating
+1  A: 

Now a days MD5 hashes or any other hashes for that matter are pre computed for all possible strings and stored for easy access. Though in theory MD5 is not reversible but using such databases you may find out which text resulted in a particular hash value.

For example try the following hash code at http://gdataonline.com/seekhash.php to find out what text i used to compute the hash

aea23489ce3aa9b6406ebb28e0cda430
Babar
Wow, that site is verrrrry interesting. Thanks for sharing,
barfoon
Ah, yes, the hash of a commonplace 7-letter word. Now use it to figure out this 11-word song lyric with whitespace and punctuation: 9f2c08d4e6158bd4854b15be50c8daa8. See you in several millenia.
Tim Keating
6fba2bbab8a8366309bf67c7df12c622? Hint: it might be the OEM version of a specific version of Mac OS X!
scherand
@Tim Keating, @scherand: Just pointing out the weakness of hash algorithms, because hash of a string is always same we don't necessarily need to crack the algorithm to figure out the actual string.
Babar
But that's not what you said. You said that hashes are "precomputed for all possible strings and stored for easy access" which is patently false (the set of "all possible strings" is infinite... and even the set of "all plausible strings" is really, really large). IMHO this misrepresents how easy it is to do a dictionary attack against a reasonable passphrase.
Tim Keating
A: 

If string ABC is hashed by some HASH function ,Are the result every time same ?

HASH(ABC) =string A

again

HASH?(ABC) = string B

string A = string B ???

That's pretty much the definition of a hash function, yes. The other highly desirable property is that any change in the original (even a single bit) should result in a massive change to the hash output.
Tim Keating