views:

440

answers:

9

Duplicate:

Confused about hashes


How can SHA encryption create unique 40 character hash for any string, when there are n infinite number of possible input strings but only a finite number of 40 character hashes?

+22  A: 

SHA is not an encryption algorithm, it is a cryptographic hashing algorithm.

Check out this reference at Wikipedia

jdigital
Side question [Excuse my cryptographic ignorance]: Does this mean that two different input strings could theoretically produce the same hash?
BenAlabaster
Yes, that is correct.
jdigital
It's so extremely unlikely that it's often assumed never to happen.
Henk
+5  A: 

SHA doesn't create a unique 40 character hash for any string. If you create enough hashes, you'll get a collision (two inputs that hash to the same output) eventually. What makes SHA and other hash functions cryptographically useful is that there's no easy way to find two files that will have the same hash.

Chris Upchurch
+8  A: 

The simple answer is that it doesn't create a unique 40 character hash for any string - it's inevitable that different strings will have the same hash.

It does try to make sure that close-by string will have very different hashes. 40 characters is a pretty long hash, so the chance of collision is quite low unless you're doing ridiculous numbers of them.

Greg
+2  A: 

Hash algorithms like SHA-1 or the SHA-2 family are used as "one-way" hashes in support of password-based authentication. It is not computationally feasible to find a message (password) that hashes to a given value. So, if an attacker obtains the list of hashed passwords, they can't determine the original passwords.

You are correct that, in general, there are an infinite number of messages that hash to a given value. It's still hard to find one though.

erickson
+3  A: 

To elaborate on jdigital's answer:

Since it's a hash algorithm and not an encryption algorithm, there is no need to reverse the operation. This, in turn, means that the result does not need to be unique; there are (in theory) in infinite number of strings that will result in the same hash. Finding out which on those are is practically impossible, though.

Rik
A: 

Nice observation. Short answer it can't and leads to collisions which can be exploited in birthday attacks.

jms
+2  A: 

It does not guarantee that two strings will have unique 40 character hashes. What it does is provide an extremely low probability that two strings will have conflicting hashes, and makes it very difficult to create two conflicting documents without just randomly trying inputs.

Generally, a low enough probability of something bad happening is as good as a guarantee that it never will. As long as it's more likely that the world will end when a comet hits it, the chance of a colliding hash isn't generally worth worrying about.

Of course, secure hash algorithms are not perfect. Because they are used in cryptography, they are very valuable things to try and crack. SHA-1, for instance, has been weakened (you can find a collision in 2000 times fewer guesses than just doing random guessing); MD5 has been completely cracked, and security researchers have actually created two certificates which have the same MD5 sum, and got one of them signed by a certificate authority, thus allowing them to use the other one as if it had been signed by the certificate authority. You should not blindly put your faith in cryptographic hashes; once one has been weakened (like SHA-1), it is time to look for a new hash, which is why there is currently a competition to create a new standard hash algorithm.

Brian Campbell
+2  A: 

The function is something like:

hash1 = SHA1(plaintext1)
hash2 = SHA1(plaintext2)

now, hash1 and hash2 can technically be the same. It's a collision. Not common, but possible, and not a problem.

The real magic is in the fact that it's impossible to do this:

plaintext1 = SHA1-REVERSE(hash1)

So you can never reverse it. Handy if you dont want to know what a password is, only that the user gave you the same one both times. Think about it. You have 1024 bytes of input. You get 40 bits of output. How can you EVER reconstruct those 1024 bytes from the 40 - you threw information away. It's just not possible (well, unless you design the algorithm to allow it, I guess....)

Also, if 40 bits isn't enough, use SHA256 or something with a bigger output. And Salt it. Salt is good.

Oh, and as an aside: any website which emails you your password, is not hashing it's passwords. It's either storing them unencrypted (run, run screaming), or encrypting them with a 2 way encryption (DES, AES, public-private key et al - trust them a little more)

There is ZERO reasons for a website to be able to email you your password, or need to store anything but the hash. /rant.

Nic Wise
A: 

The simple answer is: it doesn't create unique hashes. Look at the Pidgeonhole priciple. It's just so unlikely for there to be a collision that nobody has ever found one.

Zifre