views:

298

answers:

6

In the process of building what I'd like to hope is a properly-architected authentication mechanism, I've come across a lot of materials that specify that:

  1. user passwords must be salted
  2. the salt used should be sufficiently random and generated per-user
  3. ...therefore, the salt must be stored with the user record in order to support verification of the user password

I wholeheartedly agree with the first and second points, but it seems like there's an easy workaround for the latter. Instead of doing the equivalent of (pseudocode here):

salt = random();
hashedPassword = hash(salt . password);
storeUserRecord(username, hashedPassword, salt);

Why not use the hash of the username as the salt? This yields a domain of salts that is well-distributed, (roughly) random, and each individual salt is as complex as your salt function provides for. Even better, you don't have to store the salt in the database -- just regenerate it at authentication-time. More pseudocode:

salt = hash(username);
hashedPassword = hash(salt . password);
storeUserRecord(username, hashedPassword);

(Of course, hash in the examples above should be something reasonable, like SHA-512, or some other strong hash.)

This seems reasonable to me given what (little) I know of crypto, but the fact that it's a simplification over widely-recommended practice makes me wonder whether there's some obvious reason I've gone astray that I'm not aware of.

EDIT Some appear to not grok what the question is. I no way am I suggesting that no salt be used. Referring to TheRook's edited answer: I'm familiar with the references noted in those CWE's. The core question I have is: why is hash(username) a predictable salt?

EDIT 2 Thanks to all those that provided answers; biffabacon directly addressed my core question in his 2nd paragraph (basically, anything you can do to maximize the domain of the salts being used and therefore the hashed passwords being generated is good), but there's lots of tasty info in various comments on this question.

+7  A: 

The reason for salts was to prevent cryptanalysis attacks. Unique salt per user means you can't tell if two users have the same password. Nondeterministic salt per user means you can't tell if the same username:password is used on two systems.

Don't try to outclever the salt. If you begrudge the space, then don't use them, and put your effort into protection your data (and the backups!) directly.

Jason
This isn't the only reason why salts are used, but this is a valid reason to use a salt. (read my post.) I gave you a +1 becuase most people don't realize this property of salts, which is mentioned in crypto text books. There is a similar concern with Blockciphers and that is why IV's are used.
Rook
That's very true of modern application of the salt, but if you think about it, it's just a more sinister version of the same attack. With a precomputed dictionary or a rainbow table (which our OP should google), you're comparing someone's password file to every file that could exist.Which brings me to another concern, and that is to make certain you get enough data into the hash to fill its natural block size at least once (ie your input has to be bigger than the output) hash(salt + username + pad) where the pad can be a known value so long as it's AFTER one of the random fields.
Jason
@Jason: regarding "filling the hash's natural block size": that is to ensure proper distribution of hashed values over the domain of the hash, right? So, for, say, SHA-512, does that imply that minimum input to the hash function should be 512 bytes long? Any references on this particular sub-topic? Some off-the-cuff googling didn't turn up anything obviously useful.
Chas Emerick
@Chas: No, you're right, I spoke out of turn. SHA actually already has a logic for dealing with odd-sized inputs (any input not adding up to a whole 'block size' is end padded by zeroes). I was thinking about cryptographic PRNG's, which are a whole other ball of wax, not relevant to this discussion, and the above is still inaccurate even in that context.
Jason
@Jason this is an important security feature because if you have sql injection in the application and you **do not** have the salt. Then an attacker can change their password to something very common like "password" and then check their hash against all others in the database using sql injection, then the attacker can find others with the password of "password". However you did not address CWE-760, so this system is still technicality **vulnerable** to attack.
Rook
@Rook: I'm recommending a per-user salt (actually per-password change would be even better). I'm not sure how that doesn't address 760.
Jason
@Rook: I would also like to point out that in the realm of security questions, if you demand that all answers to be in the form of an accurately detailed attack against the alternative to The Right Way, then you're going to spend a lot of time correcting people who aren't as dumb as you think.
Jason
+6  A: 

The salt helps protect against an attacker using a precomputation or dictionary attack. When a salt is used, the attacker needs to create a separate dictionary for every salt value. However, if the salt isn't random you give the attacker an advantage, because they can create dictionaries that are more likely than others. For example, they could create a dictionary using a salt of jsmith (or hash of jsmith). For this reason, it is generally a good idea for the salt to be random.

Comparing a precomputation attack against a hash(username) salt and a random salt: let's say for example the attacker decides to create dictionaries for the most common 1000 usernames and, say, half a dozen different hash algs; that's 6000 salts and so 6000 dictionaries. If a random 32 bit salt is used that's 2^32 or circa 4.2 billion dictionaries. So when the salt space is dramatically reduced (like it is by using hash(username)) precomputed attacks become much more feasible

bignum
It seems a little crazy that an attacker would generate dictionaries that assume salts of hash(username), especially (a) given the variety of hash functions and block sizes in the wild, and (b) the fact that an attacker wouldn't be able to infer the hash used to generate the salt from the salted password they have available to them.
Chas Emerick
@Chas: Read enough advisories, you will probably see people fall down with this by making a small app, turning it into a high profile app, and they don't re-check their assumptions. But you're right in that there's a point where if you're a big enough target, they'll find a cheaper attack than this. Like the phishing. The good news is that it's not very hard to make authentication not be your tall tent pole. Use salts (plural), use cryptographic hashes, and use SSL for login pages (salted password files are particularly sensitive to Man in the Middle attacks. *cue Rook to detail how*)
Jason
@Jason: Yup, we hit all three (our entire site shall be run over SSL). I've now long since just tossed in large, random, per-user, regenerated-on-each-pw-change salts (co-located with the passwords, sorry Rook ;-), as it's easy enough, and I don't mind being a little more paranoid to match up with what is considered best practices. Unfortunately, it doesn't look like I'm going to find clarity w.r.t. the original question. Thanks for the tips otherwise, tho. :-)
Chas Emerick
@Chas Emerick: added an example - hope this helps.
bignum
Generating rainbow tables for 'root', 'administrator', 'postmaster', and so forth would be a totally reasonable thing for an attacker to do if the OP's proposal was widely used (which has to be a consideration any time you're pondering using something).
Nick Johnson
@Chas Emerick its cool it looks like i'm the only one on SO who has actually broken a password hash ;)
Rook
@bignum Why are you only taking rainbow-crack style precomputed attacks into consideration? What about John The Ripper?
Rook
+5  A: 

The point of the salt is to prevent the attacker from performing parallel attacks. That parallelism must be understood both space- and time-wise; roughly speaking, this means sharing the attack cost between two or more attacks.

For instance, consider a non-salted hashed password setting. The attacker can hash all words in a dictionary for a cost proportional to the size of the dictionary, and check those hashed words with regards to several hashed passwords. This can be simultaneous (the attacker has a list of hashed passwords and wants to crack one) or iterative (the attacker precomputes his hashed dictionary, then uses it as a tool against several passwords in distinct systems). Either way, this is cost sharing.

The salt is some data which should be somewhat unique to each hashed password instance. Salting prevents such cost sharing, to the extent of the uniqueness of the salt.

Using the user name (or hash thereof) as a salt leverages user name uniqueness: usually, on a given system at a given time, user names are unique. This prevents locally space-wise sharing: if the attacker gets a snapshot of all hashed passwords, he cannot attack them in parallel with cost sharing; he will have to incur the hashed dictionary cost for every attacked password. However, this does not prevent time-wise sharing (the attacker precomputes a hashed dictionary with the salt corresponding to user "bob" and will regularly try to guess Bob's password, assuming that Bob changes his password on a regular basis, e.g. because this is mandated by his system administrator). This does not prevent either some global sharing (there are several -- many -- systems out there, with a user going under the name of "bob").

So using the user name as salt is not bad; this is better than using no salt at all. But a random salt is still better, because it will change even in situations where the the user name is kept unchanged (a user changing his password; two users on distinct systems with the same name).

Thomas Pornin
OK, that's an aspect I hadn't considered/read about before: ensuring that an attacker would have to regenerate tables for the same account each time the password is changed. That would certainly justify requiring a random per user salt added per password change, but that's different than the objective I see most frequently: to ensure that an attacker can't use the same table for more than one account. Am I wrong that hash(hash(username).password) accomplishes the latter?
Chas Emerick
hash(hash(username)||password) works great if you only worry about an attacker who gets a single snapshot of all hashed passwords _at a single given time_ and _on a single machine_. That's what user names give you: two distinct users cannot have the same name on the same machine at the same time. If you take time into account (a user changes his password) or multiple systems (there can be a "bob" on many systems, and it could be worthwhile for an attacker to precompute bob-tables), then the user name is not sufficient as a salt.
Thomas Pornin
@Thomas Pornin I disagree with most of what you said. Especially the part about parallelism, which could not be more incorrect. If a salt is properly implemented then an attacker will need a massive cluster for brute force.
Rook
caf
A: 

I'm going to take a slightly contrary view. A pure hash isn't the best idea for the reasons given (but better than nothing), but I think it's an entirely different story if you do a hash of an application-wide salt + username. It's still per-user but not something an attacker can easily guess. Obviously you'll want to make sure that the app-wide salt isn't visible to an attacker, e.g., by reading it from a file that's outside of the app server directory tree.

You can extend this a bit further with multiple hashes. That is, don't just use

  H(password.username.appsalt)

use something like

  h1 = H(password.appsalt1.username.password)
  h2 = H(password.appsalt2.username.password)
  h3 = H(password.appsalt3.username.password)
  H(h1.H(h2.H(h3))))

(where H() is the hash and '.' is simple concatenation.) The extra time won't have a noticeable impact on your application but makes the cost to an attacker MUCH higher.

I think multiple hashes are a good idea even if you store random hashes in the user authentication table. Again it won't have a noticeable impact on your application but will make it much harder for an attacker.

bgiles
Massive waste of resources with negligible gain in security, espically if you are using a secure message digest like sha256.
Rook
How many users do you have logging in simultaneously? :) Seriously, it isn't something I would have done a few years ago but I've seen previous strong hashes fall to the wayside and don't like putting my trust into a single pass any more.
bgiles
Don't forget you have to also make sure to sabotage your project so it never becomes a huge success. Or, you could just do it the right way. I still don't understand why people expend so much energy trying to break something so trivially simple to do the right way the first time. Just do it, and never look back.
Jason
Again, 'waste of resources' is a red herring here - if you waste resources, the attacker has to waste a large multiple of those resources, so it's a _good_ idea, not a bad one. That said, this scheme isn't a good idea - use an established, tested scheme, with multiple hashing and per-user, randomly generated salts.
Nick Johnson
Yes, I overstated the hashing a bit. :-) I probably wasn't clear but I also believe a random per-user salt is best but that's not always possible due to circumstances beyond our control. (e.g., an old-school DBA). I don't think the idea of one hash vs. two hashes is as clear-cut as people think though - I remember reading abstracts by cryptologists that compared different hashing approaches and some are much stronger than others with even just one extra hash. Think of the difference between H(s,m) vs H(m,s). I wish I could remember the cites - maybe in a latter comment.
bgiles
+1  A: 

If you use a random salt, you're pulling from a very large, random pool of possibilities. When you're pulling from usernames, you're pulling from nowhere near as large of a pool, as usernames are usually all lowercase, dictionary-word'ish, and are a subject of constraints that the OS/authentication system puts on the usernames (must start with a letter, no special characters, some OS's still require up to 8 char usernames). Lots of usernames are standardized, or at least popularized: root, administrator, bob, mary... you get the idea. Another problem is that usernames aren't usually protected, you can see them through apache's user directories, anonymous ftp often allows the public directory to group things up by username, etc. An attacker could just start by harvesting the usernames, and building themselves a very nice list of salts.

All this stuff adds up to one problem: higher probability of coming up with a list of salts that works.

This gives attackers an ability to do offline pre-calculation of usernames and their possible hashes, setting it up for a bruteforce attack. You might want to create a challenge-response mechanism to thwart that.

Marcin
A: 

There are 2 recognized vulnerabilities relating to the use of salts. The first is CWE-759, which states that a salt must be used for passwords. The 2nd vulenrablity is much more important, it is CWE-760: Use of a One-Way Hash with a Predictable Salt . The salting mechanism that you are purposing is a vulnerability according to CWE-760,

A salt should be generated using a large Cryptographically secure pseudo-random number. This number should be base256. A good size would be the same number of bits that the message digest produces. For instance SHA256 should have a 256bit salt. Both should have the same amount of entropy because they are both susceptible to brute force.

In order to break a salt of this size you'll need a rainbow table so large we don't even have a word for it.

Why not use the hash of the username as the salt?

A password hash cannot be broken until the salt is retrieved. Salts make precomputed attacks more resource intensive, but never impossible. The problem with using a hash of the user name is 2 fold. First of all you are computing 2 message digests which is a waste of resources. From a security perspective this salting mechanism is unsuitable becuase the username for web applications is often public knowledge. A salt must be a secret, ideally this secret is stored separately from the password hashes. If the salt is stored in the database along side the password hash, then SQL Injeciton can be used to obtain both values and then a simple dictionary attack can be used to break the hash.

To improve the secuirty provided by a salt it should be stored separately from the password hashes such that both must be compromised before the any hash can be broken. This can be accomplished by storing the salts in a separate database, or in a local flat file. Keep in mind that mysql's file_priv's could be use do to read this flat file, so make sure this is disabled.

Rook