tags:

views:

367

answers:

3

In Python I could've converted it to Unicode and do '(?u)^[\w ]+$' regex search, but PHP doesn't seem to understand international \w, or does it?

+4  A: 

Although I haven't tested myself, looking at http://us3.php.net/manual/en/reference.pcre.pattern.syntax.php suggests the following: '/^[\p{L} ]+$/u' would work - the \p{L} will match any unicode letter. Additionally, you can apparently write this without the curly brackets - '/^[\pL ]+$/u'.

John Fiala
Great! Worked as a charm: print preg_match('/^[\p{L} ]+$/u', 'привет мир');
Slava N
Works without curlies too.
Slava N
+1  A: 

afaik PHP isn't aware of utf8, meaning that php itself won't be able to process it other than bytewise.

PHP believes everything is latin1, but there is however extensions that might be useful for you, like mbstring.

http://se.php.net/mbstring

jishi
A: 

Getting UNICODE working properly everywhere in the code base is one of the "big" features of PHP6.

Until then the word is you are recommended NOT to use UNICODE in php due to numerous security problems that can develop from it.

A lot of the code just isn't UNICODE aware, and thus not safe and exploits can get through it in ways that are really unpleasant.

Kent Fredric
After \ as a namespace separator - I don't trust anything in PHP and certainly not going to wait until PHP6 :) But thanks for precaution.
Slava N