Although this seems like a trivial question, I am quite sure it is not :)
I need to validate names and surnames of people from all over the world. How can I do that with a regular expression? If it were only English ones I think that this would cut it:
^[a-z -']+$
However, I need to support also these cases:
- other punctuation symbols as they might be used in different countries (no idea which, but maybe you do!)
- different Unicode letter sets (accented letter, greek, japanese, chinese, and so on)
- no numbers or symbols or unnecessary punctuation or runes, etc..
Is there a standard way of validating these fields I can implement to make sure that our website visitors have a great experience and can actually use their name when registering?
I would be looking for something similar to the many "email address" regexes that you can find on google.
For the sake of clarity, I don't need one single regex for the "whole" name. I would expect users to be able to split their name in the two main constituents according to their customs, and not to use suffixes and titles -- which could be contained in other fields if need be.
The main purpose of the question is to validate against XSS and SQL-injection (yes, I already use stored procedures, but I need to future- and idiot-proof the data).
The way any XSS filter will work is by only allowing what is strictly necessary -- not by disallowing known XSS vectors (i.e. disallowing "script", "<", etc...). To get an idea of the incredible variety of attacks that can be used, take a look here: http://ha.ckers.org/xss.html.
Sorry for not mentioning this before, and thus making the question a bit more misterious, but I didn't want to read 30 answers translitterating "disallow the < or > and you are safe!".
See here for a good starting point on Unicode character classes in C# Regexes -- which of these are strictly necessary for writing a name? I honestly have no idea of which, but possibly the collective mind of stackoverflow can help?
(I am prepared to force people like Jennifer 8 Lee to write their name in letters ;-)
So, I did "bother" to do it myself, because I think nobody else even tried. Guess what? Apparently I did find a proper answer, posted below! It wasn't that hard.
Can you help me find a valid, existing name or a XSS vector that can break that validation?