- What restrictions should I impose on usernames? why?
- What restrictions should I not impose on usernames? why?
P.S. db is via best-practice PDO so no risk of sql injection
Thanks
P.S. db is via best-practice PDO so no risk of sql injection
Thanks
I've never seen the point in adding restrictions to usernames. If your code is resistant to sql injection attacks then let them put in anything they want.
The only restriction I'd add is a max length one so that it can be stored in a DB table
Let them use any Unicode character in their username. Adding restrictions on the allowed characters will probably just annoy people using a non-ascii language.
Depends on many things, for instance, if the users are going to have their own URL, you want to be careful that someone who creates the username "%41llan" doesn't clash with the user called "Allan", while allowing forward-slash may cause problems. Look out for those sorts of constraints.
SQL injection protection is a must, but that should probably be in your code, not in username restrictions. Certain characters should definitely be escaped, like \, %, etc.
It will on what kind of site you're running, but I think some obscene word restrictions would make your site look more professional no matter what. If someone sees that people are allowed to go around with "EXPLETIVE" as they're username, your site will look childish. Its like allowing teenagers to run rampid in your book store IMHO. You probably don't need to get much more picky than that, although its completely up to you.
This is slightly off topic, but as another piece of username advice, a great feature of any website is allowing users to change they're username over time. You can just have a number as a primary key, and allowing them to do this can save a lot of whining and people creating new accounts because they wanted to change their username. :D
OK, so let's assume you're doing all your string-encoding tasks right. You've not got any SQL injections, HTML injections, or places where you're not URL-encoding something you should. So we don't need to worry about characters like "<&%\ being magic in some contexts. And you're using UTF-8 for everything so all of Unicode is in play. What other reasons are there to limit usernames?
To start with, all control characters, for sanity. There is no reason to have characters U+0000 to U+001F ot U+007F to U+009F in a username.
Next, deny or normalise unexpected whitespace. You may want to allow a space in a username, but you almost certainly don't want to allow leading spaces, trailing spaces, or more than one space in a row. They may render the same in HTML, but are probably a user error that will confuse.
If you intend to allow that username to be used to login through HTTP Basic Authentication, you must disallow the :
character, because the Basic Auth scheme encodes a ‘username:password’ pair with no escaping if there's a colon in the username or password. So at least one of the username and password must have the colon excluded, and it's better that that's the username because restricting people's choice of passwords is a much worse thing than usernames.
For Basic Authentication you may also want to disable all non-ASCII characters, as they are handled differently by different browsers. IE encodes them using the system codepage; Firefox encodes them using ISO-8859-1; Opera encodes them using UTF-8. Users should at least be warned before choosing non-ASCII names if HTTP Auth is going to be available, as actually using them will be very unreliable.
Next consider other Unicode control sequences, things like the bidi overrides and other characters listed there are unsuitable for use in markup. Probably you are going to end up putting them in markup and you don't want someone with an RLO in their name to turn a load of the text in your page backwards.
Also, if you allow Unicode do normalisation on the strings you get. Otherwise someone may have a username with a composed o-umlaut character ö
, and wonder why they can't log in on a Mac, which by default would use a separate o
character followed by combining umlaut. It's usual to normalise to the composed form NFC on the web. You may also want to do compatibility decompositions by using the form NFKC; this would allow a user Chris to log in from a Japanese keyboard in fullwidth romaji mode typing Chris. These are general issues it is good to solve for all your webapp's input, but for identifiers like usernames it can be more critical to get right.
Finally, make sure the length is OK to fit in the database without a silent truncation changing the name, especially if you are storing as UTF-8 bytes which you don't want to get snipped halfway through a byte sequence. Username truncations can also be a security issue in general.
If you are using usernames as a unique means of identification, you have much more to worry about: the already-mentioned problem of lookalikes such as Сhris
(with a Cyrillic Es С
). There are too many of these for you to handle reasonably; either restrict to ASCII or have an additional means of identifying users. (Or don't care, like SO doesn't; when I can easily call myself Chris anyway I have no need to call myself С
-hris.)