views:

3092

answers:

10

I'm having some trouble understanding the purpose of a salt to a password. It's my understanding that the primary use is to hamper a rainbow table attack. However, the methods I've seen to implement this don't seem to really make the problem harder.

I've seen many tutorials suggesting that the salt be used as the following:

$hash = md5($salt.$password)

The reasoning being that the hash now maps not to the original password, but a combination of the password and the salt. But say $salt=foo and $password=bar and $hash=3858f62230ac3c915f300c664312c63f. Now somebody with a rainbow table could reverse the hash and come up with the input "foobar". They could then try all combinations of passwords (f, fo, foo, ... oobar, obar, bar, ar, ar). It might take a few more milliseconds to get the password, but not much else.

The other use I've seen is on my linux system. In the /etc/shadow the hashed passwords are actually stored with the salt. For example a salt of "foo" and password of "bar" would hash to this: $1$foo$te5SBM.7C25fFDu6bIRbX1. If a hacker somehow were able to get his hands on this file, I don't see what purpose the salt serves, since the reverse hash of te5SBM.7C25fFDu6bIRbX is known to contain "foo".

Thanks for any light anybody can shed on this.

EDIT: Thanks for the help. To summarize what I understand, the salt makes the hashed password more complex, thus making it much less likely to exist in a precomputed rainbow table. What I misunderstood before was that I was assuming a rainbow table existed for ALL hashes.

+32  A: 

The idea with the salt is to make it much harder to guess with brute-force than a normal character-based password. Rainbow tables are often built with a special character set in mind, and don't always include all possible combinations (though they can).

So a good salt value would be a random 128-bit or longer integer. This is what makes rainbow-table attacks fail. By using a different salt value for each stored password, you also ensure that a rainbow table built for one particular salt value (as could be the case if you're a popular system with a single salt value) does not give you access to all passwords at once.

Carl Seleborg
+1: Salt can be a portion of the hex digest of some random string built by the random number generator. Each bit is random.
S.Lott
"Rainbow tables are one form of dictionary attack that gives up some speed to save storage space." - its actually the opposite, a good rainbow table can take over GB to store, in order to save time re-hashing all possible values.
AviD
Agreed - @erickson, I think your edit is wrong there. A rainbow table requires *huge* amounts of storage, but makes it fast to get the message behind the hash.
Carl Seleborg
Well, you are both right. Compared to a standard dictionary attack, rainbow tables sacrifices speed in order to save storage space. On the other hand, compared to a brute force attack, rainbow tables uses (lots of) space to gain speed. Today, rainbow tables are almost synonymous with dictionary ...
Rasmus Faber
... attacks, but you don't need rainbow tables for dictionary attacks.
Rasmus Faber
+3  A: 

One purpose of salting is to defeat precomputed hash tables. If someone has a list of millions of pre-computed hashes, they aren't going to be able to look up $1$foo$te5SBM.7C25fFDu6bIRbX1 in their table even though they know the hash and the salt. They'll still have to brute force it.

Another purpose, as Carl S mentions is to make brute forcing a list of hashes more expensive. (give them all different salts)

Both of these objectives are still accomplished even if the salts are public.

recursive
+1  A: 

As far as I know, the salt is intended to make dictionary attacks harder.

It's a known fact that many people will use common words for passwords instead of seemingly random strings.

So, a hacker could use this to his advantage instead of using just brute force. He will not look for passwords like aaa, aab, aac... but instead use words and common passwords (like lord of the rings names! ;) )

So if my password is Legolas a hacker could try that and guess it with a "few" tries. However if we salt the password and it becomes fooLegolas the hash will be different, so the dictionary attack will be unsuccessful.

Hope that helps!

Rafa G. Argente
+8  A: 

The reason a salt can make a rainbow-table attack fail is that for n-bits of salt, the rainbow table has to be 2^n times larger than the table size without the salt.

Your example of using 'foo' as a salt could make the rainbow-table 16 million times larger.

Given Carl's example of a 128-bit salt, this makes the table 2^128 times larger - now that's big - or put another way, how long before someone has portable storage that big?

quamrana
Even if you use a single electron to store a bit, it will be quite a while before anyone produces portable storage with that capacity... unless you consider a solar system moving through the galaxy portable.
erickson
+1  A: 

I assume that you are using PHP --- md5() function, and $ preceded variables --- then, you can try looking this article Shadow Password HOWTO Specially the 11th paragraph.

Also, you are afraid of using message digest algorithms, you can try real cipher algorithms, such as the ones provided by the mcrypt module, or more stronger message digest algorithms, such as the ones that provide the mhash module (sha1, sha256, and others).

I think that stronger message digest algorithm are a must. It's known that MD5 and SHA1 are having collision problems.

daniel
+36  A: 

A public salt will not make dictionary attacks harder when cracking a single password. As you've pointed out, the attacker has access to both the encrypted password and the salt, so when running the dictionary attack, she can simply use the known salt when attempting to crack the password.

A public salt does two things: makes it more time-consuming to crack a large list of passwords, and makes it infeasible to use a rainbow table.

To understand the first one, imagine a single password file that contains hundreds of usernames and passwords. Without a salt, I could compute "md5(attempt[0])", and then scan through the file to see if that hash shows up anywhere. If salts are present, then I have to compute "md5(salt[a] . attempt[0])", compare against entry A, then "md5(salt[b] . attempt[0])", compare against entry B, etc. Now I have n times as much work to do, where n is the number of usernames and passwords contained in the file.

To understand the second one, you have to understand what a rainbow table is. A rainbow table is a large list of pre-computed hashes for commonly-used passwords. Imagine again the password file without salts. All I have to do is go through each line of the file, pull out the hashed password, and look it up in the rainbow table. I never have to compute a single hash. If the look-up is considerably faster than the hash function (which it probably is), this will considerably speed up cracking the file.

But if the password file is salted, then the rainbow table would have to contain "salt . password" pre-hashed. If the salt is sufficiently random, this is very unlikely. I'll probably have things like "hello" and "foobar" and "qwerty" in my list of commonly-used, pre-hashed passwords (the rainbow table), but I'm not going to have things like "jX95psDZhello" or "LPgB0sdgxfoobar" or "dZVUABJtqwerty" pre-computed. That would make the rainbow table prohibitively large.

So, the salt reduces the attacker back to one-computation-per-row-per-attempt, which, when coupled with a sufficiently long, sufficiently random password, is (generally speaking) uncrackable.

Ross
dictionary attacks are not strictly the same as rainbow table attacks...
hop
I'm not sure what I said in my answer to imply that they were?
Ross
erickson, I think the edit was confusing--I don't think most people consider a rainbow table attack to be a kind of dictionary attack. Let me know if there's something specific you think is confusing in my answer, and I'll try to correct it.
Ross
+25  A: 

The other answers don't seem to address your misunderstandings of the topic, so here goes:

Two different uses of salt

I've seen many tutorials suggesting that the salt be used as the following:

$hash = md5($salt.$password)

[...]

The other use I've seen is on my linux system. In the /etc/shadow the hashed passwords are actually stored with the salt.

You always have to store the salt with the password, because in order to validate what the user entered against your password database, you have to combine the input with the salt, hash it and compare it to the stored hash.

Security of the hash

Now somebody with a rainbow table could reverse the hash and come up with the input "foobar".

[...]

since the reverse hash of te5SBM.7C25fFDu6bIRbX is known to contain "foo".

It is not possible to reverse the hash as such (in theory, at least). The hash of "foo" and the hash of "saltfoo" have nothing in common. Changing even one bit in the input of a cryptographic hash function should completely change the output.

This means you cannot build a rainbow table with the common passwords and then later "update" it with some salt. You have to take the salt into account from the beginning.

This is the whole reason for why you need a rainbow table in the first place. Because you cannot get to the password from the hash, you precompute all the hashes of the most likely used passwords and then compare your hashes with their hashes.

Quality of the salt

But say $salt=foo

"foo" would be an extremely poor choice of salt. Normally you would use a random value, encoded in ASCII.

Also, each password has it's own salt, different (hopefully) from all other salts on the system. This means, that the attacker has to attack each password individually instead of having the hope that one of the hashes matches one of the values in her database.

The attack

If a hacker somehow were able to get his hands on this file, I don't see what purpose the salt serves,

A rainbow table attack always needs /etc/passwd (or whatever password database is used), or else how would you compare the hashes in the rainbow table to the hashes of the actual passwords?

As for the purpose: let's say the attacker wants to build a rainbow table for 100,000 commonly used english words and typical passwords (think "secret"). Without salt she would have to precompute 100,000 hashes. Even with the traditional UNIX salt of 2 characters (each is one of 64 choices: [a–zA–Z0–9./]) she would have to compute and store 4,096,000,000 hashes... quite an improvement.

hop
Really nice answer. It helped me understand things so much better. +1
wcm
+3  A: 

Most methods of breaking hash based encryption rely on brute force attacks. A rainbow attack is essentially a more efficient dictionary attack, it's designed to use the low cost of digital storage to enable creation of a map of a substantial subset of possible passwords to hashes, and facilitate the reverse mapping. This sort of attack works because many passwords tend to be either fairly short or use one of a few patterns of word based formats.

Such attacks are ineffective in the case where passwords contain many more characters and do not conform to common word based formats. A user with a strong password to start with won't be vulnerable to this style of attack. Unfortunately, many people do not pick good passwords. But there's a compromise, you can improve a user's password by adding random junk to it. So now, instead of "hunter2" their password could become effectively "hunter2908!fld2R75{R7/;508PEzoz^U430", which is a much stronger password. However, because you now have to store this additional password component this reduces the effectiveness of the stronger composite password. As it turns out, there's still a net benefit to such a scheme since now each password, even the weak ones, are no longer vulnerable to the same pre-computed hash / rainbow table. Instead, each password hash entry is vulnerable only to a unique hash table.

Say you have a site which has weak password strength requirements. If you use no password salt at all your hashes are vulnerable to pre-computed hash tables, someone with access to your hashes would thus have access to the passwords for a large percentage of your users (however many used vulnerable passwords, which would be a substantial percentage). If you use a constant password salt then pre-computed hash tables are no longer valuable, so someone would have to spend the time to compute a custom hash table for that salt, they could do so incrementally though, computing tables which cover ever greater permutations of the problem space. The most vulnerable passwords (e.g. simple word based passwords, very short alphanumeric passwords) would be cracked in hours or days, less vulnerable passwords would be cracked after a few weeks or months. As time goes on an attacker would gain access to passwords for an ever growing percentage of your users. If you use a unique salt for every password then it would take days or months to gain access to each one of those vulnerable passwords.

As you can see, when you step up from no salt to a constant salt to a unique salt you impose a several orders of magnitude increase in effort to crack vulnerable passwords at each step. Without a salt the weakest of your users' passwords are trivially accessible, with a constant salt those weak passwords are accessible to a determined attacker, with a unique salt the cost of accessing passwords is raised so high that only the most determined attacker could gain access to a tiny subset of vulnerable passwords, and then only at great expense.

Which is precisely the situation to be in. You can never fully protect users from poor password choice, but you can raise the cost of compromising your users' passwords to a level that makes compromising even one user's password prohibitively expensive.

Wedge
+6  A: 

Yet another great question, with many very thoughtful answers -- +1 to SO!

One small point that I haven't seen mentioned explicitly is that, by adding a random salt to each password, you're virtually guaranteeing that two users who happened to choose the same password will produce different hashes.

Why is this important?

Imagine the password database at a large software company in the northwest US. Suppose it contains 30,000 entries, of which 500 have the password bluescreen. Suppose further that a hacker manages to obtain this password, say by reading it in an email from the user to the IT department. If the passwords are unsalted, the hacker can find the hashed value in the database, then simply pattern-match it to gain access to the other 499 accounts.

Salting the passwords ensures that each of the 500 accounts has a unique (salt+password), generating a different hash for each of them, and thereby reducing the breach to a single account. And let's hope, against all probability, that any user naive enough to write a plaintext password in an email message doesn't have access to the undocumented API for the next OS.

Adam Liss
+1  A: 

I'm going to post this as I have a question that I'm not sure has been answered to my understanding:

  1. say your whole database is lost, stolen or hacked
  2. you have three relevant columns: "username", "passwordHashed", "salt"
  3. "salt" is a long string of random chars, unique for each user
  4. the hacker peruses the database and picks out a juicy username such as "admin"

What's to stop the hacker compiling a new comprehensive rainbow table where every result in his table is pre-hashed with the salt? After all, they only want to break that one account.

Today I've read many, many comments on Coding Horror posts generally saying that salting makes a rainbow table ineffective against discovering all the passwords in a database but I would imagine the savvy hacker would really only be interested in those few records with admin privileges, or perhaps in cracking a specific individual, in which case the other (x)-1 records in the DB are irrelevant.

So then the limiting factor is time and storage. Is that the best thing that we can say about our current security model? -- that it will take too much time and storage to attempt? I would point to a skilled bot-net developer who could slice computational time up against millions of machines. I would say that terabyte storage now is only a few harddrives.

Wouldn't it bet better to keep a collection of salts outside of the database -- I'm not saying I know the answer to where, but just some place to keep the key away from the door?

Cirieno
If you're going after a single (salted) password, you obtain no benefit from precomputing hashes. The reason salts were invented was to avoid several user's passwords having the same hash, thereby making the amortized cost of attacking a list of hashed passwords much lower than the sum of the costs of attacking each individual hashed password. See Ross' answer.
ninjalj