views:

35

answers:

2

Hello, I'm using the PDO class but I'm triying to remove all chars except...:

function cleaner($str){
    return preg_replace('/[^a-zA-Z0-9éàêïòé\,\.\']/',' ',trim($str));
}

As you can see, it's a simple function, but it removes all chars éàêïòé

example: cleaner('$#$<<>-//La souris a été mangée par le chat ') //returns

La souris a t mang e par le chat (The mouse has been eaten by the cat :) )

Any help will be appreciate

+1  A: 

You need to add /u pattern modifier to your pattern to turn on UTF-8 support in PCRE. This is assuming everything is in UTF-8 already.

http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

m1tk4
+1  A: 
$str = '$#$<<>-//La souris a été mangée par le chat ';
$str = preg_replace('/[^a-zA-Z0-9éàêïòé\,\.\']/u',' ',trim($str));

$str = '$#$<<>-//La souris a été mangée par le chat ';
$str = preg_replace('/[^\p{L}\,\.\']/u',' ',trim($str));

Both the snippets worked for me, on PHP 5.3. The second regular expression is less restricted, and accepts all Unicode letters.

kiamlaluno
Thanks for ur help. I'm using Npp and it adds **** to the file. I think those chars break the encoding.
jartaud
@jartaud: Which PHP version are you using?
kiamlaluno
Are those characters added at the beginning of the file?
kiamlaluno
@kiamlaluno, sorry for the delay, I'm using PHP 5.3 and yes those chars are added at the biginning. Now i've setted the encoding to utf8.
jartaud
I was thinking the editor was adding the BOM mark but, AFAIK, the characters used for the BOM should not be visible in the edited file.
kiamlaluno
@kiamlaluno, thanks again buddy. I opened the file with PHPDesigner and I saw the BOM mark.
jartaud