tags:

views:

168

answers:

5

I was wondering whether md5, sha1 and anothers return unique values.

For example, sha1() for test returns a94a8fe5ccb19ba61c4c0873d391e987982fbbd3, which is 40 characters long. So, sha1 for strings larger than 40 chars must be the same (of course it's scrambled, because the given input may contain whitespaces and special chars etc.).

Due to this, when we are storing users' passwords, they can enter either their original password or some super-long one, which nobody knows.

Is this right, or do these hash algorithms provide really unique results - I'm quite sure it's hardly possible.

+9  A: 

(Note: You're asking about hashing functions, not encryption).

It's impossible for them to be unique, by definition. They take a large input and reduce its size. It obviously follows, then, that they can't represent all the information they have compressed. So no, they don't provide "truly unique" results.

What they do provide, however, is "collision resistant" results. I.e. they try and show that two slightly different datas produce a significantly different hash.

Noon Silk
+1  A: 

Hashing algorithms never guarantee a different result for a different input. That's why hashing is always used as a one-way "encryption".

But you have to be realistic, a 160-bit hashing algorithm can have 2^160 possible combinations, which is... a lot! (1 with 48 zeroes)

Philippe Leybaert
+1  A: 

These are not encryption functions, but hashing ones.

Hashing, by definition, can have two different strings collide (map to the same value) for the very reasons you mention. But that is usually not relevant because:

  1. Cryptographic hashes (such as SHA1) try hard to make the collision probability for similar strings (very, very) low
  2. You cannot deduce the original string from the hash.

These two mean that you cannot take a hash and easily generate one of the strings that map to it.

Vinko Vrsalovic
+2  A: 

SHA1 is not encryption algorithm, but a cryptographic hash function.

You are right - since it maps arbitrary long input to a fixed size hash there can be collisions. But the idea of a cryptographic hash function is to make it impossible to create such collisions "on demand". That's why we call the one-way hash functions, too.

Quote (source):

The ideal cryptographic hash function has four main or significant properties:
* it is easy to compute the hash value for any given message,
* it is infeasible to find a message that has a given hash,
* it is infeasible to modify a message without changing its hash,
* it is infeasible to find two different messages with the same hash.

tanascius
+6  A: 

Hashing algorithms (which is what you are referring to) do not provide unique results. What you are referring to is called the Pigeonhole Principle. The number of inputs exceeds the number of outputs, so multiple inputs must be mapped to the same output. This is why the longer the output hash the better, because there are less number of inputs mapped to an output.

Encrypting something must provide a unique results, because you can encrypt a message and decrypt it and get the same message.

Kevin
+1 for pigeonhole principle which is the proof you want
jk