views:

611

answers:

7

Hey, I got a question regarding regexp in general. I'm currently building a register form where you can enter the full name (given name and family name) however I cant use [a-zA-Z] as a validation check because that would exclude everyone with a "foreign" character.

What is the best way to make sure that they don't enter a symbol, in both php and javascript?

Thanks in advance!

+2  A: 

The correct solution to this problem (in general) is POSIX character classes. In particular, you should be able to use [:alpha:] (or [:alphanum:]) to do this.

Though why do you want to prevent users from entering their name exactly as they type it? Are you sure you're in a position to tell them exactly what characters are allowed to be in their names?

Andrzej Doyle
I just want to make sure they use a real name instead of numbers and symbols. Its kind of obvious that a person wont be named #!#"%12=1.
+1  A: 

You first need to conceptually distinguish between a "foreign" character and a "symbol." You may need to clarify here.

Accounting for other languages means accounting for other code pages and that is really beyond the scope of a simple regexp. It can be done, but on a higher level, the codepages have to work.

tkotitan
A: 

I don't know how you would account for what is valid or not, and depending on your global reach, you will probably not be able to remove anything without locking out somebody. But a Google search turned this up which may be helpful.

http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_symbol_characters_web_page

ryanday
A: 

You could loop through the input string and use the String.charCodeAt() function to get the integer character code for each character. Set yourself up with a range of acceptable characters and do your comparison.

Josh Stodola
\S allows all the symbols and dingbats as well, so this is of limited use.
Richard
You are correct. Changed answer.
Josh Stodola
A: 

As noted POSIX character classes are likely the best bet. But the details of their support (and alternatives) vary very much with the details of the specific regex variant.

PHP apparently does support them, but JavaScript does not.

This means for JavaScript you will need to use character ranges: /[\u0400-\u04FF]/ matches any one Cyrillic character. Clearly this will take some writing, but not the XML 1.0 Recommendation (from W3C) includes a listing of a lot of ranges, albeit a few years old now.

One approach might be to have a limited check on the client in JavaScript, and the full check only server side.

Richard
+1  A: 

If you strictly wanted your regexp to fail on punctuation and symbols, you could use [^[:punct:]], but I'm not sure how the [:punct:] POSIX class reacts to some of the weird unicode symbols. This would of course stop some one from putting in "John Smythe-Jones" as their name though (as '-' is a punctuation character), so I would probably advise against using it.

David Dean
+1  A: 

I don’t think that’s a good idea. See How to check real names and surnames - PHP

Gumbo