views:

203

answers:

3

I have an input field in a form. Upon pushing submit, I want to validate to make sure the user entered non-latin characters only, so any foreign language characters, like Chinese among many others. Or at the very least test to make sure it does not contain any latin characters.

Could I use a regular expression for this? What would be the best approach for this?

I am validating in both javaScript and in PHP. What solutions can I use to check for foreign characters in the input field in both programming languages?

+2  A: 

In PHP, you can check the Unicode property IsLatin. That's probably closest to what you want.

So if preg_match('/\p{Latin}/u', $subject) returns true, then there is at least one Latin character in your $subject. See also this reference.

JavaScript doesn't support this; you'd have to contruct the valid Unicode ranges manually.

Tim Pietzcker
All I am trying to do is to test to make sure the characters in the field are non-latin characters.
zeckdude
Define "non-latin characters".
Tim Pietzcker
Please see the comment I left above.
zeckdude
+1  A: 

In Javascript, at least, you can use hex codes inside character range expressions:

var rlatins = /[\u0000-\u007f]/;

You can then test to see if there are any latin characters in a string like this:

if (rlatins.test(someString)) {
  alert("ROMANI ITE DOMUM");
}
Pointy
This will trigger on punctuation and other characters besides Latin letters.
Tim Pietzcker
Yes it will, It was unclear in the original question what was intended. I took the guess that checking /[A-Za-z]/ was so obvious that it couldn't possibly be the problem :-)
Pointy
The suggestion you made works really well, except that as soon as I add a period or space or anything like that in the field along with foreign characters, then it validates as being latin. How can I change your suggestion to allow for other characters like that, but still not allow any normal letters? Thanks for your help!
zeckdude
Well, the idea is to make a regular expression that covers that characters that you *don't* want. Thus, you can change the "range" so that the characters that are "ok" (in the nominal "Latin" range) are excluded. Just change the broad "everything" range in my example so that it skips the characters you want to allow.
Pointy
+1  A: 

You're trying to check if all letters are not Latin, but you do accept accented letters.

A simple solution is to validate the string using the regex (this is useful if you have a validation plugin):

/^[^a-z]+$/i
  • ^...$ - Match from start to end
  • ^[...] - characters that are not
  • a-z - A though Z,
  • + - with at least one letter
  • /i - ignoring case (could also done /^[^a-zA-Z]+$/ )

Another option is simply to look for a letter:

/[a-z]/i

This regex will match if the string conatins a letter, so you can unvalidated it.

In JavaScript you can check that easily with if:

var s = "שלום עולם";
if(s.match(/^[^a-z]+$/i){
}

or

if(!s.match(/[a-z]/i))

PHP has a different syntax and more security than JavaScript, but the regular expressions are the same.

Kobi