I have this string : Verbesserungsvorschläge which I think is in German. Now I want to match it with a regex in php. To be more general, I want to match such characters like German which are not 100% in the ASCII set.
Thanks.
I have this string : Verbesserungsvorschläge which I think is in German. Now I want to match it with a regex in php. To be more general, I want to match such characters like German which are not 100% in the ASCII set.
Thanks.
It's world of hurt, but you can try using the hex value, as in "/Verbesserungsvorschl\xc3ge/" for simple extended characters.
The hex values can be found in a table for determined on the fly with
echo dechex( ord( ä ) );
For full unicode, you can use /u as a modifier. See http://www.php.net/manual/en/regexp.reference.unicode.php and other pages. My understanding is that unicode will work better in PHP version 6.
preg_match_all('~[^\x00-\x7F]~u', 'Verbesserungsvorschläge', $matches);
If you're working with an 8-bit character set, the regex [\x80-\xFF]
matches any character that is not ASCII. In PHP that would be:
if (preg_match('/[\x80-\xFF]/', $subject)) {
# String has non-ASCII characters
} else {
# String is pure ASCII or empty
}