views:

208

answers:

3

So I need to get value false or true if string contains not only letters of all european and east alphabets and " "(space) and "-" minus. How to do such thing with some $a string?

+1  A: 

It sounds like you are tackling a character set issue the wrong way, but I could be wrong...? A few gotchas that have tripped me up in the past..

  • DOMDocument processes everything internally in UTF8, regardless.
  • An include in a different character set
  • Database not outputting in Unicode because of a lack of a "SET NAMES.." instruction.

Whats the code supposed to achieve? Maybe if we can look past your question to the next step, there's a better solution out there.

danp
+1  A: 

Here's what regular-expressions.info has to say on the subject of Unicode and PHP:

Regular expressions on PHP

The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE library (Perl-Compatible Regular Expressions). Anything said about the PCRE regex flavor in the regular expression tutorial on this website applies to PHP's preg functions. You should use the preg functions for all new PHP code that uses regular expressions.

A special option is the /u which turns on the Unicode matching mode, instead of the default 8-bit matching mode. You should specify /u for regular expressions that use \x{FFFF}, \X or \p{L} to match Unicode characters, graphemes, properties or scripts. PHP will interpret '/regex/u' as a UTF-8 string rather than as an ASCII string.


Unicode support

The Unicode standard places each assigned code point (character) into one script. A script is a group of code points used by a particular human writing system. Some scripts like Thai correspond with a single human language. Other scripts like Latin span multiple languages.

Very few regular expression engines support Unicode scripts today. Of all the flavors discussed in this tutorial, only the JGsoft engine, Perl and PCRE can match Unicode scripts. Here's a complete list of all Unicode scripts:

  • \p{Cyrillic}
  • [...rest omitted]

Therefore, if you want to see if a string consists only of letters in Cyrillic in PHP, you can try to match it against this regular expression:

/^\p{Cyrillic}*$/u
polygenelubricants
+2  A: 

Try this:

if (preg_match('/^[\p{L&} -]+$/u', $a)) {
  # Only letters (any script), spaces and hyphens
} else {
  # Emtpy string or other characters too
}

\p{L} matches any character that can be part of a word in any script, including ideographs (e.g. Chinese characters). \p{L&} matches only letters from alphabetic scripts (Latin, Greek, Cyrillic, Thai, etc.)

Jan Goyvaerts