tags:

views:

94

answers:

3

I have this string : Verbesserungsvorschläge which I think is in German. Now I want to match it with a regex in php. To be more general, I want to match such characters like German which are not 100% in the ASCII set.

Thanks.

A: 

It's world of hurt, but you can try using the hex value, as in "/Verbesserungsvorschl\xc3ge/" for simple extended characters.

The hex values can be found in a table for determined on the fly with

echo dechex( ord( ä ) ); 

For full unicode, you can use /u as a modifier. See http://www.php.net/manual/en/regexp.reference.unicode.php and other pages. My understanding is that unicode will work better in PHP version 6.

Devin Ceartas
Thanks for the unicode link. Unfortunately in my case the charset is in latin1_swedish_ci.
Shawn
A: 
preg_match_all('~[^\x00-\x7F]~u', 'Verbesserungsvorschläge', $matches);
Geert
+1  A: 

If you're working with an 8-bit character set, the regex [\x80-\xFF] matches any character that is not ASCII. In PHP that would be:

if (preg_match('/[\x80-\xFF]/', $subject)) {
  # String has non-ASCII characters
} else {
  # String is pure ASCII or empty
}
Jan Goyvaerts