tags:

views:

64

answers:

3

Hi all,

I'm trying to replace string "Red Dwarf (TV Series 1988â€") - IMDb" to "Red Dwarf (TV Series 1988') - IMDb"

I have a translation table of these funny characters in an array. I tried to replace them using: str_replace but it did not work. Can anybody suggest a workaround on this? This is the snippet of the code:

function replaceFunnyChar( $input ){

$translation = array(
    '’' => "'",
    "â€\"" => '-',
    'é' => 'é',
    'è' => 'è',
    '“' => '"',
    'â€' => '"',
    '‘' => "'",
    'â' => 'ã',
    'Ã"' => 'ä',
    'â€"' => '–',
    'Ä«' => 'ī',
    '阴' => '阴',
    'é™°' => '陰',
    "阳" => "阳",
    "陽" => "陽",
    '´' => "'",
    'ü' => 'ü',
    "Ã,Ã'" => "'",
    '•' => '–'
);


foreach( $translation as $find => $replace ){
    $output = str_replace($find, $replace, $input );    
    //$output = preg_replace("/" . $find . "/", $replace, $input );
}
return $output;
}
+2  A: 

It is best to detect the encoding of the data you have (if you are scraping, then it is in the HTTP header, and overridden by the meta tag in the HTML), then you can use something such as Iconv to convert it: http://php.net/manual/en/book.iconv.php

If the data you get is UTF-8, you don't actually need to convert it. Just store it and make sure your DBMS is set up to support UTF-8. Then when displaying the data again, make sure you specify UTF-8 on your webpage.

If you are using Windows command line to show the characters, it is a little more complicated as Windows command line doesn't use UTF-8. Try Ubuntu or Mac OS X.

Also, if you already have the data but cannot download it again, then you need to make sure how you show the characters -- if shown on a webpage, then the web browser can further mess up the characters if it uses a different encoding than what it is supposed to be. You can also dump the bytes out, and replace the string using the byte sequence instead of quoted string as in the original code.

動靜能量
A: 

From the Top of my Head, thats an Decoding Error, you can probably get rid of it when you play around with the charsets for a while .

Anyhow, you can also just drop every char over ASCI 127:

function _dropAsciOver127($entity){
    if(($asciCode = ord($entity[0])) > 127){
        return '';
     }else{
        return $entity[0];
     }
 }

$weird = 'Red Dwarf (TV Series 1988â€") - IMDb';
$cool = preg_replace_callback('/[^\w\d ]/i','_dropAsciOver127', $weird);
print $cool; // prints Red Dwarf (TV Series 1988") - IMDb
Hannes
A: 

I think your problem is your CHARSET, and a solution is to save the document as a UTF-8 (whitout BOM) in your text editor. Else you can add a header to your page, and it can be done like this:

HTML

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

PHP

header('Content-type: text/html; charset=utf-8');

Remember to set the header on top on top of the page! If you still having problems with charset, then try to change it from UTF-8 to ISO or something like that.

Holsteinkaa