views:

464

answers:

9

I would like to allow my users to use Unicode for their passwords.

However I see a lot of sites don't support that (e.g. Gmail, Hotmail).

So I'm wondering if there's some technical or usability issue that I'm overlooking.

I'm thinking if anything it must be a usability issue since by default .NET accepts Unicode and if Hotmail--er, the new Live mail--is built on that, I don't see why they would restrict it.

Has anyone encountered similar issues?

+17  A: 

I am sure there is no technical problem but maybe gmail and hotmail are not supporting that on purpose. This kind of website have a wide audience and should be accessible from everywhere.

Let's imagine the user have a password in Japanese but he is on travel and go to a cyber cafe and there is no Japanese support the user won't be able to login.

One other problem is to analyze the password complexity, it's not so difficult to make sure the user didn't type a common word in English but what about in Chinese/Russian/Thai. It is much more to analyze the complexity of a password as you add more languages.

So in case you want your system to be accessible, it's better to ensure that the user would be able to type his password on every kind of device/os/environment, so the alpha numeric password with most common symbols(!<>"#$%& etc..) is kind of good set of characters available everywhere.

RageZ
+1: Good exmaple
Ben S
When the Palm Pre first came out, you couldn't bring up the 'symbol' table on password fields -- which made it impossible for me to log into most sites. They've fixed it, so I have access to the characters that I needed, but even then not all ASCII characters are available. (I used to use BEL (^G) a lot in passwords, when a former sysadmin told me it was possible. TAB (^I) is almost impossible to enter via web forms. # can give problems as the first character over some types of older terminal connections)
Joe
Although the French/Belgian AZERTY keyboard users have a huge problem finding back the 'common' symbols on a QWERTY... Which would then lead to supporting only letters and digits (for the user's sake), if you follow the same rationale.
xtofl
That's a great example. I figured that might have been some accessibility issue at play here.
KL90
A: 

I'm sure that the multilingual counterparts of those sites do support unicode. It sounds like a user requirements issue rather than a technical challenge.

Kai
A: 

I would not be surprised if there is a technical issue with the server not being certain of the encoding the client is sending the password in.

However, I would guess that, say, sites with mainly native-speaking Japanese, Chinese or Russian audiences would use the commonly used respective non-ASCII character set (Big5, EUC-KR, koi8, etc.) for passwords. Maybe you can research what they are doing to cope with older web clients using any of the non-Unicode stuff.

ndim
What are "respective non-ASCII character sets"? Do you mean Unicode?
dottedmag
No, I mean all the non-Unicode stuff. OK, most of them probably have ASCII as a subset, but they are still quite different from Unicode. Big5, koi8, EUC-KR come to mind. http://en.wikipedia.org/wiki/Character_encoding has a more complete list.
ndim
+8  A: 

Generally I am strongly in favor of not restricting what kinds of characters are allowed in passwords. However, remember that you have to compare something to something stored which may be the password or a hash. In the former case you have to make sure that comparison is done correctly which is much more complex with Unicode than with ASCII alone; in the latter case you would have to ensure that you are hashing exactly the same whenever it is entered. Normalization forms may help here or be a curse, depending on who applies them.

For example, in an application I'm working on I am using a hash over a UTF-8 conversion of the password which was normalized beforehand to weed out potential problems with combining characters and such.

The biggest problem the user may face is that they can't enter it in some places, like on another keyboard layout. This is already the case for one of my passwords but never was a problem so far. And after all, that's a decision the user has to make in choosing their password and not one the application should make on behalf of the user. I doubt there are users who happily use arbitrary Unicode in their passwords and not think of the problems that may arise when using another keyboard layout. (This may be an issue for web-based services more than anything else, though.)

There are instances where Unicode is rightly forbidden, though. One such example is TrueCrypt which forces the use of the US keyboard layout for boot-time passwords (for full-volume encryption). There is no other layout there and therefore Unicode or any other keyboard layout only produces problems.

However, that doesn't explain why they forbid Unicode in normal passwords. A warning might be nice but outright forbidding is wrong in my eyes.

Joey
@Johannes: +1 not to forbid but warn the user
RageZ
+1. Good points.
devstuff
+1 for the warning as well! I also agree that an app shouldn't be the one to decide that. Love the idea.
KL90
+1 - also please use "bold" regarding Unicode normalization - if someone fails to do it it will be really bad.
Sorin Sbarnea
A: 

I support Unicode passwords in all of my web applications. If using a recent browser the visitor can use any code point in their preferred or native scripts.

For enhanced security I store a salted hash rather than using reversible encryption.

The important thing is to correctly normalize and encode the password string before adding the byte sequence to the hash (I prefer UTF-8 for endian independence).

devstuff
Thanks for the tips. I'll definitely be doing a salted hash as well (and UTF-8).
KL90
If you go this route (and any other route allowing Unicode characters, actually), make sure you read up on [Normalization](http://unicode.org/reports/tr15/) and use one consistently (preferably NFC)
Joachim Sauer
Thanks for that Joachim, forgot to mention that.
devstuff
A: 

Unicode sucks if you have to do programmatic matching. The "minus sign" and "dash" look the same, but might be separate codes. "n with a funny tilde over it" might be one letter, or a diacritic and a letter.

If people use different encoding methods, then their passwords might not match, even though the passwords look the same. See omg-ponies aka humanity=epic fail.

You can normalize, but what happens when:

  • the normalization rules change
  • you have some users with diacritics in their password
  • you have some users with combined letters in their password
  • the passwords are hashed, so you can't change the passwords

Guess what - you need to force a password reset on some of your users.

wisty
"n with a funny tilde over it" is ñ btw.
Henri Watson
A: 

Good idea.

Makes the password stronger, gives more freedom to the users. And it is already done by Windows (since at least Win 2000), Active Directory and LDAP, Novell (since at least 2004)

Some customers want it (http://mailman.mit.edu/pipermail/kerberos/2008-July/013923.html) and there is even a standard on how to do it right (http://tools.ietf.org/html/rfc4013).

Mihai Nita
+3  A: 

So I'm wondering if there's some technical or usability issue that I'm overlooking.

There's a technical issue with non-ASCII passwords (and usernames, for that matter) with HTTP Basic Authentication. As far as I know the sites you mentioned don't generally use Basic Authentication, but it might be a hangover from systems that do.

The HTTP Basic Authentication standard defines a base64-encoded username:password token. This means if you have a colon in the username or password the results are ambiguous. Also, base64-decoding the token gives you only bytes, with no direction of how to convert those bytes to characters. And guess what? The different browsers use different encodings to do it.

  • Opera and Chrome use UTF-8.

  • IE uses the client system's default code page (which is of course never UTF-8) and mangles characters that don't fit in it using the Windows standard Try To Find A Character That Looks A Bit Like It, Or Maybe Just Not (Who Cares) algorithm.

  • Safari uses ISO-8859-1, and silently refuses to send any auth token at all when the username or password has characters that don't fit.

  • Mozilla takes the lowest 8 bits of the code point (similar to ISO-8859-1, but more broken). See bug 41489 for tortuous discussion with no outcome or progress.

So if you allow non-ASCII usernames or passwords then the Basic Authentication process will be at best complicated and inconsistent, with users wondering why it randomly works or fails when they use different computers or browsers.

bobince
+1 for everyone's favorite: Try To Find A Character That Looks A Bit Like It, Or Maybe Just Not (Who Cares) algorithm.
huntaub
A: 

No. Restrict passwords to ASCII characters.

When you input a password, bullets are displayed to conceal the password.

But when you input Japanese and other languages, you must go through an input method, converting the keystrokes into the desired characters. This requires you to see what the characters are.

Jon Reid