views:

84

answers:

2

Adding support for Unicode passwords it an important feature that should not be ignored by developers.

Still, adding support for Unicode in passwords is a tricky job because the same text can be encoded in different ways in Unicode and you don't want to prevent people from logging in because of this.

Let's say that you'll store the passwords as UTF-8, and mind that this question is not related to Unicode encodings and it's related to Unicode normalization.

Now the question is how you should normalize the Unicode data?

You have to be sure that you'll be able to compare it. You need to be sure that when the next Unicode standard will be released it will not invalidate your password verification.

Note: still there are some places where Unicode passwords will probably never be used, but this question is not about why or when to use Unicode passwords, it is about how to implement them in the proper way.

1st update

Is it possible to implement this without using ICU, like using OS for normalizing?

+3  A: 

A good start is to read Unicode TR 15: Unicode Normalization Forms. Then you realize that it is a lot of work and prone to strange errors - you probably already know this part since you are asking here. Finally, you download something like ICU and let it do it for you.

IIRC, it is a multistep process. First you decompose the sequence until you cannot further decompose - for example é would become e + ´. Then you reorder the sequences into a well-defined ordering. Finally, you can encode the resulting byte stream using UTF-8 or something similar. The UTF-8 byte stream can be fed into the cryptographic hash algorithm of your choice and stored in a persistent store. When you want to check if a password matches, perform the same procedure and compare the output of the hash algorithm with what is stored in the database.

D.Shawley
A: 

A question back to you- can you explain why you added "without using ICU"? I see a lot of questions asking for things that ICU does (we* think) pretty well, but "without using ICU". Just curious.

Secondly, you may be interested in StringPrep/NamePrep and not just normalization: StringPrep - to map strings for comparison.

Thirdly, you may be intererested in UTR#36 and UTR#39 for other Unicode security implications.

*(disclosure: ICU developer :)

Steven R. Loomis
I have nothing against ICU but in some cases its size can be a real issue. For this reason you may want to use OS specific API.
Sorin Sbarnea
If you're only using normalization, you can trim down the size pretty easily (both of code and data). Also, ICU is often installed as a module. Thank you for the response.
Steven R. Loomis