tags:

views:

43

answers:

3
$string1 = preg_replace('/[^A-Za-z0-9äöü!&_=\+-]/', ' ', $string4);

This Regex shouldn't replace the chars äöü. In Ruby it worked as expected. But in PHP it replaces also the ä ö and ü.

Can someone give me a hint how to fix it?

A: 

Unicode support is one of the features promised for PHP 6.

Currently in php5

JapanPro
ah, really? wasn't aware, but how can i workaround that?
ndi
that's unicode support for the code itself. (Unicode variable names, etc). PHP does have support for dealing with UFT-8 strings at present (although there are a bunch of gotcha's, and the native string functions typically don't work well with it)...
ircmaxell
but there are some function that still supported in php5, check the update.
JapanPro
the mb functions make problems with the output.
ndi
thats right, you cant trust most of the time it works but sometime you dont know. that why my first line is "Unicode support is one of the features promised for PHP 6." @ndi, have you tried perl instead
JapanPro
no, i don't know pearl. i go with ruby instead. thanks for help
ndi
A: 

i think this should work:

$string1 = preg_replace('/\[^A-Za-z0-9\pL!&_=\+-]/u', ' ', $string4 );
revaxarts
this is also not working?
ndi
'/[^\pL]/u' where the L is for "Letters"Maybe you have to add the other chars as wellThis requires PHP > 5.1 (http://de2.php.net/manual/en/regexp.reference.unicode.php)
revaxarts
+2  A: 

Set the u pattern modifier (to tell php to treat the regex as a UTF-8 string).

'/[^A-Za-z0-9äöü!&_=\+-]/u'
ircmaxell
don't know why, but this seems not to work
ndi
Are you sure those are the exact characters? That they are not character entities? Can you show us the source strings that you're not getting to work?
ircmaxell
yes sure: $string4="abcdeför ahügöld ;:"; without the /u it replaces the ö and ü, with the /u it outputs nothing
ndi
It works fine for me here (with or without the `/u`): `var_dump($string1, $string4);` yields: `string(22) "abcdeför ahügöld " string(22) "abcdeför ahügöld ;:"`. What version of PHP? Also, are you saving the file as a UTF-8 file (the PHP source code file)?
ircmaxell
I'm using php 5.3 and the sourcecode is saved in utf-8 format. when i use vardump with /u i get NULL string(19) "abcdeför ahügöld ;:" when i use without u it gives me abcdef r ah g ld string(19) "abcdef r ah g ld " string(19) "abcdeför ahügöld ;:"
ndi
sorry guys, was my fault, had another mistake in my script. the regex now works as expected, but still have problems with charset and öäü replacement in different charsets
ndi