tags:

views:

3322

answers:

11

Is hashing a password twice before storage any more or less secure than just hashing it once?

What I'm talking about is doing this:

$hashed_password = md5( md5( plaintext_password ) );

instead of just this:

$hashed_password = md5( plaintext_password );

If it is less secure, can you provide a good explanation (or a link to one)?

Also, does the hash function used make a difference? Does it make any difference if you mix md5 and sha1 (for example) instead of repeating the same hash function?

Note 1: When I say "double hashing" I'm talking about hashing a password twice in an attempt to make it more obscured. I'm not talking about the technique for resolving collisions.

Note 2: I know I need to add a random salt to really make it secure. The question is whether hashing twice with the same algorithm helps or hurts the hash.

+7  A: 

Yes - it reduces the number of possibly strings that match the string.

As you have already mentioned, salted hashes are much better.

An article here: http://websecurity.ro/blog/2007/11/02/md5md5-vs-md5/, attempts a proof at why it is equivalent, but I'm not sure with the logic. Partly they assume that there isn't software available to analyse md5(md5(text)), but obviously it's fairly trivial to produce the rainbow tables.

I'm still sticking with my answer that there are smaller number of md5(md5(text)) type hashes than md5(text) hashes, increasing the chance of collision (even if still to an unlikely probability) and reducing the search space.

Rich Bradshaw
This is incorrect. And the cited link (which isn't authoritative anyway), finishes by saying that multiple rounds is safer--if you want to take advice from "luca" in Romania who heard from a friend that multiple rounds isn't as good. Read Applied Cryptography, or PKCS#5.
erickson
I didn't say that I thought that the link was authoritative, rather that it says the opposite to what I would think. I'm not an expert on this subject, my post is just my gut reaction to the question.
Rich Bradshaw
Ah, I was misreading your post. Yes, it is conceivable that all 2^128 hash outputs are not produced with inputs from 0 to 2^128 - 1, which would reduce the space. However, since MD5 only uses 128 bits of internal state, my gut reaction is to say that each 128 bit input yields a unique output.
erickson
Technically correct but cryptographically insignificant, therefore incorrect. Don't forget that the easiest vector of attack on password-based cryptography isn't the derived key, it's the password itself.
orip
Thanks for the link. I understand it's not authoritative, but it gives me a good starting point.
Bill the Lizard
A: 

I'm going to go out on a limb and say it's more secure in certain circumstances... don't downvote me yet though!

From a mathematical / cryptographical point of view, it's less secure, for reasons that I'm sure someone else will give you a clearer explanation of than I could.

However, there exist large databases of MD5 hashes, which are more likely to contain the "password" text than the MD5 of it. So by double-hashing you're reducing the effectiveness of those databases.

Of course, if you use a salt then this advantage goes away.

Greg
But it's quite straightforward to generate a database of md5(md5(text)) data, it just takes awhile.
Rich Bradshaw
Yeah, the point is that the databases of md5(text) already exist
Greg
I wouldn't say that it is any more feasable to create a database of Md5(md5(x)) than it is to do md5(x), and doing databases of md5(x) takes a little more than a while.
SoapBox
Google for "md5 database", put in your md5(text) string... voila, md5 cracked in a matter of seconds (if it's in the DB of course ;)
Greg
Yes, but is it more secure other than the fact that noone is producing databases? Surely if there was an advantage to md5(md5(text)) then that would already be part of the md5 algorithm.
Rich Bradshaw
No, as a matter of fact it's less secure. The absence of a DB is the only advantage.
Greg
Can we just agree to put in big letters on every post "assuming you are using a secret salt", and stop having to talk about rainbow tables?
SquareCog
+2  A: 

In general, it provides no additional security to double hash or double encrypt something. If you can break the hash once, you can break it again. It usually doesn't hurt security to do this, though.

In your example of using MD5, as you probably know there are some collision issues. "Double Hashing" doesn't really help protect against this, since the same collisions will still result in the same first hash, which you can then MD5 again to get the second hash.

This does protect against dictionary attacks, like those "reverse MD5-databases", but so does salting.

On a tangent, Double encrypting something doesn't provide any additional security because all it does is result in a different key which is a combination of the two keys actually used. So the effort to find the "key" is not doubled because two keys do not actually need to be found. This isn't true for hashing, because the result of the hash is not usually the same length as the original input.

SoapBox
All correct, but I just want to note that the effect of the strong collision resistance compromise on MD5 is blown a bit out of proportion -- most scenarios that use crypto hash functions do not rely on strong collision resistance, just weak resistance. They are not affected by this vulnerability.
SquareCog
+21  A: 
erickson
It is less secure mathematically, and you'd be much better off using a sleep() than intentionally making a slow algorithm
Greg
Prove that it is less secure mathematically. Why would an attack throw a sleep into his cracker? This makes no sense whatsoever.
erickson
Number of arbitrary strings: infinity. number of possible MD5 values: 2^128. Sleep works because they are using your server to authenticate in RoBorg's scenario. Now, number of likely passwords is much lower than 2^128, and so is number of derived Md5s, so that argument is questionable.
SquareCog
Intentionally making a slow algorithm is an accepted practice when you're trying to prevent dictionary attacks against compromised authentication stores. The technique is called "key strengthening" or "key stretching". See http://en.wikipedia.org/wiki/Key_stretching
Willie Wheeler
@RoBorg: it doesn't matter how slow _your_ implementation is, but how slow an attacker's implementation will be: if the hash itself is thousands of times slower, it will take an attacker thousands of times as long to brute-force the password.
orip
@erickson - with PBKDF2 you increase the number of iterations to increase the processing time, so that isn't a special bcrypt property. Bcrypt, on the other hand, hasn't received nearly as much cryptological review as HMAC and the SHA-2 hashes have.
orip
Ahh I see what you mean now
Greg
Arguably you would want collisions within the 128-bit space 0 through 2^128-1. If the hash algorithm's 2^128 output space is perfect, then theoretically, you just have a substitution cipher with an alphabet of 2^128 glyphs.
jmucchiello
Substitution ciphers are weak because frequency distributions in the plaintext are propagated to the ciphertext. That wouldn't apply here, where every plaintext "character" is equally likely.
erickson
The chances of you coming across an md5 collision by accident is the same as winning the intergalactic lottery, its just not going to happen. http://stackoverflow.com/questions/800685/which-hash-function-should-i-choose/817121#817121
Sam Saffron
Wouldn't it be true that if MD5(K) = [X byte result] then MD5(MD5(K)) is LESS secure (more prone to collision) when your K is > [X bytes], and equally secure (i.e. redundant/pointless) when K =< [X Bytes]?Restricting the amount of times a user can retry a password is a CRITICAL security measure, but if you are increasing your CPU usage 1000 fold for every user, then you solution is ATROCIOUSLY unscalable.
So, (because I didn't explicitly say this) you should shouldn't be hashing twice, you should favour other methods that are specifically designed to limit the number of retries, rather than taxing the CPU to do it for you.
@devin -- it's not "my solution", it's a widely accepted practice, built into password-based cryptography standards like PKCS #5, and recommended by experts like Robert Morris. It's extremely scalable, as the fraction of time spent authenticating users is small in a legitimate application. It only becomes difficult to scale when your application is cracking passwords—hence the recommendation. Certainly, the search space of a hash is smaller than that of possible passwords, but even a 128-bit space is too huge to brute-force search. The threat to defend against is an offline dictionary attack.
erickson
I was referring not to the inconvenience to the individual user, but rather the stress that would be put upon the server if you had a large user base, because you are relying on the CPU load to slow down the number of requests. It means that if you add more CPU power, you are reducing the restriction on those brute force attackers. -- However, you are completely correct about the scalability, and the widely accepted practice. I was wrong about almost all the things I said in my earlier comments. Sorry :)
+1  A: 

From what I've read, it may actually be recommended to re-hash the password hundreds or thousands of times.

The idea is that if you can make it take more time to encode the password, it's more work for an attacker to run through many guesses to crack the password. That seems to be the advantage to re-hashing -- not that it's more cryptographically secure, but it simply takes longer to generate a dictionary attack.

Of course computers get faster all the time, so this advantage diminishes over time (or requires you to increase the iterations).

Bill Karwin
I mentioned this in another comment too, buthttp://en.wikipedia.org/wiki/Key_stretching
Willie Wheeler
+2  A: 

Personally I wouldn't bother with multiple hashses, but I'd make sure to also hash the UserName (or another User ID field) as well as the password so two users with the same password won't end up with the same hash. Also I'd probably throw some other constant string into the input string too for good measure.

$hashed_password = md5( "xxx" + "|" + user_name + "|" + plaintext_password);
Ben Daniel
Actually, it should be a string randomly generated for each user, not a constant.
Bill the Lizard
A constant secret works (and is easier to work with), if you throw in the username as suggested. That essentially produces a random user-specific key.
SquareCog
A constant secret salt is security through obscurity. If the "secret" gets out that you're using "xxx" + username + password, then an attacker doesn't even need data from your tables to launch an attack against it.
Bill the Lizard
I don't think that it's security through obscurity. The reason for using a salt is that you can't compute a rainbow table against multiple md5 hashes simultaneously. Building one for "xxx"+password (same salt) happens once. Building a table for "xxx"+username+password is worse than brute forcing.
FryGuy
@FryGuy: usernames aren't secret. If the "secret" salt xxx is used, the attack is reduced to building one dictionary to attack a specific username. If you use random salts then your database would have to be compromised before the same attack would be possible.
Bill the Lizard
@Bill the Lizard: "the attack is reduced to building one dictionary to attack a specific username" is just a brute-force attack (actually even worse, because in addition to computing all hashes you have to store them), so the salt works perfectly in this case.
porneL
Wow, I'm starting to feel a wee bit naive/ignorant on this topic! What the heck is a rainbow table? lol I'm finding these comments a bit hard to follow, not sure what we're assuming the hacker has access to and how they're attacking. I guess I need to do some more research on this! :)
Ben Daniel
@Ben Daniel, a [rainbow table](http://en.wikipedia.org/wiki/Rainbow_table) is a precomputed list of known hashes and what plaintext resulted in those hashes.
PP
A: 

The concern about reducing the search space is mathematically correct, although the search space remains large enough that for all practical purposes (assuming you use salts), at 2^128. However, since we are talking about passwords, the number of possible 16-character strings (alphanumeric, caps matter, a few symbols thrown in) is roughly 2^98, according to my back-of-the-envelope calculations. So the perceived decrease in the search space is not really relevant.

Aside from that, there really is no difference, cryptographically speaking.

Although there is a crypto primitive called a "hash chain" -- a technique that allows you to do some cool tricks, like disclosing a signature key after it's been used, without sacrificing the integrity of the system -- given minimal time synchronization, this allows you to cleanly sidestep the problem of initial key distribution. Basically, you precompute a large set of hashes of hashes - h(h(h(h....(h(k))...))) , use the nth value to sign, after a set interval, you send out the key, and sign it using key (n-1). The recepients can now verify that you sent all the previous messages, and no one can fake your signature since the time period for which it is valid has passed.

Re-hashing hundreds of thousands of times like Bill suggests is just a waste of your cpu.. use a longer key if you are concerned about people breaking 128 bits.

SquareCog
Re-hashing is precisely about slowing down the hash. This is a key security feature in password-based cryptography. See the links for PCKS5 and PBKDF2.
orip
+13  A: 

Yes, re-hashing reduces the search space, but no, it doesn't matter - the effective reduction is insignificant.

Re-hashing increases the time it takes to brute-force, but doing so only twice is also suboptimal.

What you really want is to hash the password with PBKDF2 - a proven method of using a secure hash with salt and iterations. Check out this SO response.

EDIT: I almost forgot - DON'T USE MD5!!!! Use a modern cryptographic hash such as the SHA-2 family (SHA-256, SHA-384, and SHA-512).

orip
A: 

Double hashing is ugly because it's more than likely an attacker has built a table to come up with most hashes. Better is to salt your hashes, and mix hashes together. There are also new schemas to "sign" hashes (basically salting), but in a more secure manner.

Sargun Dhillon
Tables of common passwords and dictionary words (and their hashes) certainly exist, but I question whether tables with hashes of hashes exist. Salting is certainly better, as I mentioned in the question.
Bill the Lizard
A: 

As several responses in this article suggest, there are some cases where it may improves security and others where it definately hurts it. There is a better solution that will definately improve security. Instead of doubling the number of times you calculate the hash, double the size of your salt, or double the number of bits used int the hash, or do both! Instead of SHA-245, jump up to SHA-512.

Stefan Rusek
This doesn't answer the question.
Bill the Lizard
Double hashing is not worth the effort, but doubling your hash size is. I think this is a more valuable point.
Stefan Rusek
A: 

I know this is kind of an old post but I can't seem to find a definitive answer to this question either. I'm no mathematician, but theoretically wouldn't using two different salts in the 2 different hashes result in a hash that would take twice as long to crack? I also pause my login scripts to slow down brute force attacks (just remember to limit your web server connections per IP or bye-bye apache).

example:

$hash = md5(SECONDARYSEED.$userid.md5(DEFAULTSEED.$userid.'zt!s94d#x'.$password));

sleep(3);

return $hash;

I always use a hard coded string (in case variables in memory are compromised somehow), something unique to the string i'm trying to encode (like a user_id), and two salts that are globally defined for the application. I feel pretty confident you won't find this in a rainbow table, but would it take twice as long to brute force like this?

Jasen
SO is not a forum. Only post an answer if it's truly an answer. To answer your question, double hashing will double the time it takes to calculate a hash, but the difference is usually not significant for security purposes (the exception being if the content of a message is extremely time sensitive, and a result in 5 hours (or 2 weeks) is useful but 10 hours (or 4 weeks) isn't). As for string constants, they will be stored in memory while a program is running.
outis