ansaurus

Question

PHP and character encoding problem with Â character

Answer 1

A:

From the PHP Manual Comment Page:

http://www.php.net/manual/en/function.preg-replace.php#96847

And from StackOverflow:

http://stackoverflow.com/questions/3542818/remove-accents-without-using-iconv

shamittomar 2010-08-27 19:14:38

Answer 2

A:

I use this:

function replaceSpecial($str){
$chunked = str_split($str,1);
$str = ""; 
foreach($chunked as $chunk){
    $num = ord($chunk);
    // Remove non-ascii & non html characters
    if ($num >= 32 && $num <= 123){
            $str.=$chunk;
    }
}   
return $str;
}

akellehe 2010-08-27 19:15:43

You can expand this to allow all ascii characters by changing 32 to 0 and 123 to 255.

akellehe 2010-08-27 19:16:42

This will remove MANY more characters than just accents.

shamittomar 2010-08-27 19:17:52

right, all non-pretty, non-ascii characters

akellehe 2010-08-27 19:20:18

First off, the only ASCII overlap is between 0 and 127. If you allow character 128 or higher, you'll break the encoding (this is due to the multi-byte nature of UTF-8). However, this is a quite dirty method of doing that. What I would do if I was you, is simply use the [`iconv`](http://us3.php.net/manual/en/book.iconv.php) function if you need to convert to ASCII... `$str = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string)`, especially since it'll transliterate characters for you...

ircmaxell 2010-08-27 19:49:27

+1up. Thanks for the tip :)

akellehe 2010-08-27 20:01:45

Ahh.. I think I understand the solution, but I'm still not clear why PHP doesn't recognize the characters? I think I'll use something like this, but only strip a few specific chars. Thanks!

Travis 2010-08-27 20:14:33

Answer 3

+3 A:

$string = str_replace('Â','',$string);

How is this 'Â' encoded? If your script file is saved as iso-8859-1 the string 'Â' is encoded as the one byte sequence xC2 while the (/one) utf-8 representation is xC3 x82. php's str_replace() works on the byte level, i.e. it only "knows" single-byte characters.

see http://docs.php.net/intro.mbstring

VolkerK 2010-08-27 19:23:08

+1, you can therefore write the replace as: `str_replace(chr(195) . chr(130), '', $string)`... (where `195` and `130` are `xC3` and `x82` converted from Hex to decimal, respectively)... Or, since PHP supports hex numbers: `str_replace(chr(0xC3), chr(0x82), '', $string)`...

ircmaxell 2010-08-27 19:39:07

I also found that mb_ereg_replace() didn't seem to work properly; Isn't this its purpose? Your information is extremely useful and I'll be sure to read the documentation you linked. Thanks!

Travis 2010-08-27 20:10:25

@Travis: The parameters you pass to the mbstring functions have to be encoded properly as well. If you have a string literal in your script (like 'Â') then the encoding depends on how you've saved the script file.

VolkerK 2010-08-27 23:37:13

ansaurus

tags:

views:

answers:

PHP and character encoding problem with Â character

related questions