views:

7594

answers:

20

Form Based Authentication For Websites

Please help us create the definitive resource for this topic. We believe that stackoverflow should not just be a resource for very specific technical questions, but also for general guidelines on how to solve variations on common problems. "Form Based Authentication For Websites" should be a fine topic for such an experiment.

It should include topics such as:

  • how to log in
  • how to remain logged in
  • how to store passwords
  • using secret questions
  • forgotten password functionality
  • OpenID
  • "Remember me" checkbox
  • Browser autocompletion of usernames and passwords
  • secret urls (public urls protected by digest)
  • checking password strength
  • email validation
  • and much more

It should not include things like:

  • roles and authorization
  • http basic authentication

Please help us by

  1. Suggesting subtopics
  2. Submitting good articles about this subject
  3. Editing the official answer (as soon as you have enough karma)

UPDATE: See the terrific 7-part series by Jens Roland below.

+15  A: 

List of external resources

Dos and Don’ts of Client Authentication on the Web (PDF)

21 page academic article with many great tips.

Ask YC: Best Practices for User Authentication

Forum discussion on the subject

You're Probably Storing Passwords Incorrectly

Introductory article about storing passwords

Discussion: Coding Horror: You're Probably Storing Passwords Incorrectly

Forum discussion about CodingHorror article.

Never store passwords in a database!

Another warning about storing passwords in the database.

Password cracking

Wikipedia article on weaknesses of several password hashing schemes.

Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes

Discussion about rainbow tables and how to defend against them, and against other threads. Includes extensive discussion.

Michiel de Mare
+25  A: 

Definitive Article (draft, to be edited as a wiki, if you can edit this, you're welcome to)

Sending credentials

The only practical way to send credentials 100% securely is by using SSL. Using Javascript to hash the password is not safe. (TODO citation required). There's another secure method called SRP but it's patented and there aren't any good implementations available.

Storing passwords

Don't ever store passwords as plaintext in the database. Not even if you don't care about the security of your own site. Assume that some of your users will reuse the password of their online bank account. So, store the hashed password, and throw away the original. And make sure the password doesn't show up in access logs or application logs.The best hashing function seems to be bcrypt.

Hashes by themselves are also insecure. For instance, identical passwords mean identical hashes. Instead, store the salted hash. A salt is a string appended to the hash - use a different (random) salt per user. The salt is a public value, so you can store them with the hash in the database.

This means that you can't send the user their forgotten passwords (because you only have the hash). Don't reset the user's password unless you have authenticated the user (users must prove that they know the answer to the security question, or are able to read emails sent to the stored (and validated) email address.)

Security questions

(TODO why are they secure? or are they?)

Session cookies

After the user logs in, the server sends the user a session cookie. The server can retrieve the username or id from the cookie, but nobody else can generate such a cookie (TODO explain mechanisms). Cookies can be hijacked (TODO really? how?), so don't send persistent cookies. If you want to autologin your users, you can set a persistent cookie, but you should set a flag that the user has auto-logged in, and needs to login for real for sensitive operations (TODO is this correct? I think this is what Amazon does.)

Michiel de Mare
Given the recent MITM vulnerability surrounding signed SSL certificates (https://blog.startcom.org/?p=145) so a combination of SSL and some kind of Challenge response authentication (There are alternatives to SRP) is probably a better solution.
Kevin Loney
a lot of this stuff is situational. i tend not to use session cookies at all. cookies getting hijacked is almost always the servers fault. man in the middle / packet sniffing arent that common
Shawn Simon
+3  A: 

This kind of project though worthy doesn't seem like a good fit for a questions and answers site like stack overflow.

It sounds like a best practice wiki.

If there were a question: "What are some best practices when developing website authentication?" then a link to the wiki would be the answer.

But... I don't think there should be any wiki/collaborative document writing system integrated directly into Stack Overflow.

Good idea though.

Just my 2 Cents.

Allain Lalonde
StackOverflow *is* a wiki/collaborative document system. That's the whole point! :)
Michael Pryor
@michaelpryor - Not really. Jeff and Joel might call it a wiki, but just allowing people to edit the same post doesn't make SO a wiki. SO is a forum.
Robert Paulson
@Robert "but just allowing people to edit the same post doesn't make SO a wiki." ... forgive me for a naive question but isn't that the definition of a wiki?
Swaroop C H
A: 

Great stuff but I suggest an "article" tag on this to differentiate it from questions.
I wanted to downvote it for not being a question but credit is really due for such hardwork.

Still, tag it article or something.

paan
A: 

As others have commented, this is a tricky one, verging on the edge of breaking the SO model. If individual pages must strictly be questions, this should be rephrased as something like "What are some best practices when developing website authentication?", as allain suggests. However, that still raises the issue of how it should be answered.

One option is to provide a hierarchy of questions, with more general questions ["How do I write a C program" ;)] at the top of the tree, and more specific towards the bottom. Otherwise, we face the prospect of enormous questions which users will have to trawl through for their specific answer.

I'm also concerned about long pages caused by non-definitive answers, or asides such as this one. If a specific answer becomes 'stable', should there be a mechanism to hide all the irrelevant stuff, which then remains viewable on request?

Finally, many questions do not have a definitive answer, this being a good example. The authentication used for a banking site versus a personal web log should, almost certainly, be quite different, and this would - somehow - need to be reflected in any definitive answer.

There's more on this here, and probably on a huge amount of other questions.

Bobby Jack
A: 

@[Michiel de Mare]:

The only practical way to send credentials 100% securely is by using SSL.

No practical way to send credentials is 100% secure.

Dmitry Shechtman
+2  A: 

User Authentication on the World Wide Web is old, but a good primer read nonetheless.

hendry
+4  A: 

See also Wikibooks PHP Programming: User login systems.

A: 

Hash the password on the client side (using JavaScript). Include user name and server-generated challenge to the hash.

http://blog.asgeirnilsen.com/2005/11/password-authentication-without.html

Asgeir S. Nilsen
A: 

OpenID? Seems like a good idea. Or seemed like a good idea, till I read this blog entry criticizing it: http://idcorner.org/2007/08/22/the-problems-with-openid/

Turns out the guy writing the blog...his company, Credentica, just got bought by Microsoft. His blog has lots of well thought-out entries: http://idcorner.org/2005/02/24/an-introduction-to-user-identifiers-part-1-of-4/

His MIT PhD was published as a book, and is available online: http://www.credentica.com/the_mit_pressbook.html

My opinion is that, even if I'm lost by chapter 2 of the book, and even if Credentica is now owned by the evil empire, the fact that Brands' publishes all this stuff about his system makes me much more confident in it that in OpenID.

ja
I don't see how him openly publishing about the internals of his system makes it any better than the openly published about internals of openID. *shrugs*
Tchalvak
A: 

If we are speaking about general applications that uses a "web interface" (not just internet applications). In a controlled environment as a internal network, with not more than 100 users or for web services authentication, Digitial certificates are a good solution. I just like to restrict for a small numbers of users due the not so easy administration. I'm not speak about ssl, i'm speaking about client authentication thru digital certificates.

VP
+5  A: 

(Another nugget-sized post from Jens Roland, yum..)

I think writing a canonical document about web application authentication is a phenomenal idea, but we're probably reinventing the wheel. A thorough (I said thorough) look at the following list of links should give a pretty good idea of how to approach the problem:

MUST-READ LINKS About Web Authentication

OWASP Guide To Authentication

Dos and Don’ts of Client Authentication on the Web (very readable MIT research paper)

Charles Miller's Persistent Login Cookie Best Practice

Wikipedia: HTTP cookie

Personal knowledge questions for fallback authentication: Security questions in the era of Facebook (very readable Berkeley research paper)

...as well as the list of links Michiel de Mare posted earlier


Still, FWIW, I'll gladly contribute my thoughts on the subject. These posts were going to turn into a blog post sometime soon anyway, and Stack Overflow is as good a place as any to publish them.

(especially since Googling my name now returns my Stack Overflow profile in the Top 5 -- and that's after just 4 days on the site! Remind me to meet the SEO guy (Jeff, would that be you?) and shake his hand)

Note however: I won't be addressing every bullet from the original post, just the ones where I feel I can contribute some actual guidance. A few have already been answered, and there are a couple I'm not entirely sure I understand (case in point: what do you mean by a 'secret url'?).

Jens Roland
+46  A: 

PART I: How To Log In

  1. As a rule, don't use CAPTCHAs. They are annoying, often aren't human-solvable, most of them are ineffective against bots, all of them are ineffective against cheap third-world labor (according to OWASP, the current sweatshop rate is $12 per 500 tests), and CAPTCHAs are technically illegal in most countries (see link number 1 from the MUST-READ list). If you MUST use a CAPTCHA, for the love of God, don't write your own. Use reCAPTCHA. At least it's OCR-hard by definition (since it uses already OCR-misclassified book scans).

  2. The only (currently practical) way to protect against login interception (packet sniffing) during login is by using a certificate-based encryption scheme (e.g. SSL) or a proven & tested challenge-response scheme (e.g. the Diffie-Hellman-based SRP). Any other method can be easily circumvented by an eavesdropping attacker. On that note: hashing the password client-side (e.g. with Javascript) is useless unless it is combined with one of the above - ie. either securing the line with strong encryption or using a tried-and-tested challenge-response mechanism (if you don't know what that is, just know that it is one of the most difficult to prove, most difficult to design, and most difficult to implement concepts in digital security). Hashing the password is effective against password disclosure, but not against replay attacks, Man-In-The-Middle attacks / hijackings, or brute-force attacks (since we are handing the attacker both username, salt and hashed password).

Continued...

Jens Roland
Thanks for adding the links etc., Paul
Jens Roland
Well, I don't really agree with the Captcha part, yes Captchas are annoying and they can be broken (except recaptcha but this is barely solvable by humans!) but this is exactly like saying don't use a spam filter because it has less than 0.1% false negatives .. this very site uses Captchas, they are not perfect but they cut a considerable amount of spam and there's simply no good alternative to them
Waleed Eissa
Why is this post broken into parts? Why not combine it to form a single piece?
Niyaz
Niyaz: because separating it into parts makes it possible for the reader to skip (and link) to the parts he's currently working on, and keeps the comments grouped by subtopic
Jens Roland
the current rate of people solving is $1 per 1000 captchas (with discounts for bulk users)
roddik
+17  A: 

PART II: How To Remain Logged In - The Infamous "Remember Me" Checkbox

Persistent Login Cookies ("remember me" functionality) are a danger zone; on the one hand, they are entirely as safe as conventional logins when users understand how to handle them; and on the other hand, they are an enormous security risk in the hands of most users, who use them on public computers, forget to log out, don't know what cookies are or how to delete them, etc.

Personally, I want my persistent logins for the web sites I visit on a regular basis, but I know how to handle them safely. If you are positive that your users know the same, you can use persistent logins with a clean conscience. If not - well, then you're more like me; subscribing to the philosophy that users who are careless with their login credentials brought it upon themselves if they get hacked. It's not like we go to our user's houses and tear off all those facepalm-inducing Post-It notes with passwords they have lined up on the edge of their monitors, either. If people are idiots, then let them eat idiot cake.

Of course, some systems can't afford to have any accounts hacked; for such systems, there is no way you can justify having persistent logins.

If you DO decide to implement persistent login cookies, this is how you do it:

  1. First, follow Charles Miller's 'Best Practices' article Do not get tempted to follow the 'Improved' Best Practices linked at the end of his article. Sadly, the 'improvements' to the scheme are moot.

  2. And DO NOT STORE THE PERSISTENT LOGIN COOKIE (TOKEN) IN YOUR DATABASE, ONLY A HASH OF IT! The login token is Password Equivalent, so if an attacker got his hands on your database, he could use the tokens to log in to any account, just as if they were cleartext login-password combinations. Therefore, use strong salted hashing (bcrypt / phpass) when storing persistent login tokens.

<<<prev | next>>>

Jens Roland
This has some serious problems, it disallows the user from being authenticated in 2 browsers or locations at once.
Shawn Simon
Ouch. You're right. Do you have a good idea for remedying that?
Jens Roland
There's nothing in the design which prevents a user from having multiple active tokens, unless you are clearing all tokens for the user when they use one, rather than just the token they present.
Paul Dixon
why are the improvements moot?
cherouvim
The answer instructs readers to avoid following the "Improved" article. Why is that? Why are the proposed improvements moot?
cherouvim
+18  A: 

PART III: Using Secret Questions

Don't. Never ever use 'secret questions'. Read the paper from link number 5 from the MUST-READ list. You can ask Sarah Palin about that one, after her AOL email account got hacked during the presidential campaign because the answer to her 'security' question was... (wait for it) ... "Wasilla High School"!

Even with user-specified questions, it is highly likely that most users will choose either:

  • A 'standard' secret question like mother's maiden name or favourite pet

  • A simple piece of trivia that anyone could lift from their blog, LinkedIn profile, or similar

  • Any question that is easier to answer than guessing their password. Which, for any decent password, is every question conceivable.

In conclusion, security questions are inherently insecure in all their forms and variations, and should never be employed in an authentication scheme for any reason.

The only reason anyone still uses security questions is that is saves the cost of a few support calls from users who can't remember their email passwords to get to their reactivation codes. At the expense of security and Sara Palin's reputation, that is. Worth it? You be the judge.

<<<prev | next>>>

Jens Roland
+11  A: 

PART IV: Forgotten Password Functionality

I already mentioned why you should never use security questions for handling forgotten/lost user passwords. There are at least two more all-too-common pitfalls to avoid in this field:

  1. Don't RESET user's passwords no matter what - 'reset' passwords are harder for the user to remember, which means he MUST either change it OR write it down - say, on a bright yellow Post-It on the edge of his monitor. Instead, just let users pick a new one right away - which is what they want to do anyway.

  2. Always hash the lost password code/token in the database. AGAIN, this code is another example of a Password Equivalent, so it MUST be hashed in case an attacker got his hands on your database. When a lost password code is requested, send the plaintext code to the user's email address (and don't accept an input field for this: to see why, check out this excellent article about SQL Injection in a 'forgotten password' field), then hash it, save the hash in your database -- and throw away the original. Just like a password or a persistent login token.

<<<prev | next>>>

Jens Roland
Regarding 2: The referenced article was more about SQL injection vulnerabilities in general than website authentication/forgotten password functionality. The author finished with the note: "We'd like to emphasize that though we chose the "Forgotten password" link to attack in this particular case, it wasn't really because this particular web application feature is dangerous. It was simply one of several available features that might have been vulnerable, and it would be a mistake to focus on the "Forgotten password" aspect of the presentation."
Anders Fjeldstad
+10  A: 

PART V: Checking Password Strength

First, you'll want to read this small article for a reality check: The 500 most common passwords

Okay, so maybe the list isn't the canonical list of most common passwords on any system anywhere ever, but it's a good indication of how poorly people will choose their passwords when there is no enforced policy in place. Plus, the list looks frighteningly close to home when you compare it to the publicly available analyses of 40.000+ recently stolen MySpace passwords.

Well, enough MySpace-bashing for now. Moving on..

So: With no minimum password strength requirements, 2% of users use one of the top 20 most common passwords. Meaning: if an attacker gets just 20 attempts, 1 in 50 accounts on your website will be crackable.

Luckily, thwarting it is as easy as dropping a Javascript validation algorithm on your user registration form (and duplicating it server-side in case Javascript is turned off). There are simple algorithms for determining password strength client-side, and although I haven't tested it properly, I would recommend Tyler Atkins' password strength checker:

<<<prev | next>>>

Jens Roland
Isn't there a bit of a twist to this, I've found that asking users to generate strong passwords means they'll often pick something hard to remember, and as such - write it down to remind them..
meandmycode
True, but we can't prevent people from being morons, we can only 1) educate (on how to pick easy-to-remember-but-hard-to-crack passwords), and 2) enforce minimum standards.
Jens Roland
Is it wrong to write your passwords down? Hows a hacker going to get access to that? My mom maybe...
DutrowLLC
In and of itself, writing down your passwords is only as unsafe as the place where they are written down. If they are written on a notepad in a safety deposit box, no problem. If they are written on a Post-It tacked to your display in plain sight of your coworkers and guests, that's bad. If they're written down in a text file on an unpatched Windows box with Internet Explorer -- even worse.
Jens Roland
Should 'thequickbrownfoxjumpsoverthelazydog' be considered a very strong password? According to Tyler Atkins' test it is.
Wayne Werner
@Wayne: that's debatable. Naturally, Tyler Atkins' test uses a common-passwords-list which can always be expanded, and you might want to add a body of common phrases, sayings, song lyrics and quotes; on the other hand, I'd be surprised to see a dictionary attack in the wild that included a lot of 35-character phrases like that one.
Jens Roland
@Jens - of course if any crackers *or* hackers are smart enough to read this post with any regularity, they should put that one in now ;)
Wayne Werner
@Wayne: you bet
Jens Roland
+16  A: 

PART VI: Much More - Or: Preventing Rapid-Fire Login Attempts

First, have a look at the numbers: Password Recovery Speeds - How long will your password stand up

If you don't have the time to look through the tables in that link, here's the gist of them:

  1. It takes virtually no time to crack a weak password, even if you're cracking it with an abacus

  2. It takes virtually no time to crack an alphanumeric 9-character password, if it is case insensitive

  3. It takes virtually no time to crack an intricate, symbols-and-letters-and-numbers, upper-and-lowercase password, if it is less than 8 characters long (a desktop PC can search the FULL KEYSPACE up to 7 characters in less than 90 days)

  4. It would, however, take an inordinate amount of time to crack even a 6-character password, if you were limited to one attempt per second!

So what can we learn from these numbers? Well, lots, but we can focus on the most important part: the fact that preventing large numbers of rapid-fire successive login attempts (ie. the brute force attack) really isn't that difficult. But preventing it right isn't as easy as it seems.

Generally speaking, you have three choices that are all effective against brute-force attacks (and dictionary attacks, but since you are already employing a strong passwords policy, they shouldn't be an issue):

  • Present a CAPTCHA after N failed attempts (annoying as hell and often ineffective -- but I'm repeating myself here)

  • Locking accounts and requiring email verification after N failed attempts (this is a DoS attack waiting to happen)

  • And finally, login throttling: that is, setting a time delay between attempts after N failed attempts (yes, DoS attacks are still possible, but at least they are far less likely and a lot more complicated to pull off)

Best practice #1: A short time delay that increases with the number of failed attempts, like:

  • 1 failed attempt = no delay
  • 2 failed attempts = 2 sec delay
  • 3 failed attempts = 4 sec delay
  • 4 failed attempts = 8 sec delay
  • 5 failed attempts = 16 sec delay
  • etc.

DoS attacking this scheme would be very impractical, but on the other hand, potentially devastating, since the delay increases exponentially. A DoS attack lasting a few days could suspend the user for weeks.

Best practice #2: A medium length time delay that goes into effect after N failed attempts, like:

  • 1-4 failed attempts = no delay
  • 5 failed attempts = 15-30 min delay

DoS attacking this scheme would be quite impractical, but certainly doable. Also, it might be relevant to note that such a long delay can be very annoying for a legitimate user. Forgetful users will dislike you.

Best practice #3: Combining the two approaches - either a fixed, short time delay that goes into effect after N failed attempts, like:

  • 1-4 failed attempts = no delay
  • 5+ failed attempts = 20 sec delay

Or, an increasing delay with a fixed upper bound, like:

  • 1 failed attempt = 5 sec delay
  • 2 failed attempts = 15 sec delay
  • 3+ failed attempts = 45 sec delay

This final scheme was taken from the OWASP best-practices suggestions (link 1 from the MUST-READ list), and should be considered best practice, even if it is admittedly on the restrictive side.

As a rule of thumb however, I would say: the stronger your password policy is, the less you have to bug users with delays. If you require strong (case-sensitive alphanumerics + required numbers and symbols) 9+ character passwords, you could give the users 2-4 non-delayed password attempts before activating the throttling.

DoS attacking this final login throttling scheme would be very impractical. And as a final touch, always allow persistent (cookie) logins (and/or a CAPTCHA-verified login form) to pass through, so legitimate users won't even be delayed while the attack is in progress. That way, the very impractical DoS attack becomes an extremely impractical attack.

Additionally, it makes sense to do more aggressive throttling on admin accounts, since those are the most attractive entry points

<<<prev | next>>>

Jens Roland
Why not just lock the account after 3-6 failed attempts?
Jess
@LuckyLindy: Because an attacker could then abuse that to lock out any user he wanted. That's the DoS (Denial of Service) attack I mention
Jens Roland
@LuckyLindy - In my office if you try like 10 failed logins, the windows account locks up and you need to talk to an admin to free you. Needless to say, people prank each other all the time using this "feature".
Mikle
+7  A: 

PART VII: Distributed Brute Force Attacks

Just as an aside, more advanced attackers will try to circumvent login throttling by 'spreading their activities':

  • Distributing the attempts on a botnet to prevent IP address flagging

  • Rather than picking one user and trying the 50.000 most common passwords (which they can't, because of our throttling), they will pick THE most common password and try it against 50.000 users instead. That way, not only do they get around maximum-attempts measures like CAPTCHAs and login throttling, their chance of success increases as well, since the number 1 most common password is far more likely than number 49.995

  • Spacing the login requests for each user account, say, 30 seconds apart, to sneak under the radar

Here, the best practice would be logging the number of failed logins, system-wide, and using a running average of your site's bad-login frequency as the basis for an upper limit that you then impose on all users.

Too abstract? Let me rephrase:

Say your site has had an average of 120 bad logins per day over the past 3 months. Using that (running average), your system might set the global limit to 3 times that -- ie. 360 failed attempts over a 24 hour period. Then, if the total number of failed attempts across all accounts exceeds that number within one day (or even better, monitor the rate of acceleration and trigger on a calculated treshold), it activates system-wide login throttling - meaning short delays for ALL users (still, with the exception of cookie logins and/or backup CAPTCHA logins).

EDIT: Posted a question with more details and a really good discussion of how to avoid tricky pitfals in fending off distributed brute force attacks

<<<prev

Jens Roland
A: 

Very useful stuff.

Any idea how systems like UNIPASS work? It claims to be a digital certificate but I wasn't aware of using such schemes for website user validation. Any other examples out there that use something like this?

Manabenz

Manabenz