views:

43

answers:

2

Hi guys, from a PHP script I'm downloading a RSS feed like:

$fp = fopen('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss','r') 
 or die('Error reading RSS data.'); 

The feed is an spanish news feed, after I downloaded the file I parsed all the info into one var that have only the content of the tag <description> of every <item>. Well, the issue is that when I echo the var all the information have an html enconding like:

echo($result); // this print: el ministerio pãºblico investigarã¡ la publicaciã³n en la primera pã¡gina

Well I can create a HUGE case instance that searchs for every char can change it for the correspongind one, like: ã¡ for Á and so and so, but there is no way to do this with a single function??? or even better, there is no way to download the content to $fp without the html encoding? Thanks!

Actual code:

<?php
$acumula="";
$insideitem = false; 
$tag = ''; 
$title = ''; 
$description = ''; 
$link = ''; 

function startElement($parser, $name, $attrs) { 
 global $insideitem, $tag, $title, $description, $link; 
 if ($insideitem) { 
  $tag = $name; 
 } elseif ($name == 'ITEM') { 
  $insideitem = true; 
 } 
} 




function endElement($parser, $name) { 
 global $insideitem, $tag, $title, $description, $link, $acumula; 
 if ($name == 'ITEM') { 
  $acumula = $acumula . (trim($title)) . "<br>" . (trim($description)); 
  $title = ''; 
  $description = ''; 
  $link = ''; 
  $insideitem = false; 
 } 
} 

function characterData($parser, $data) { 
 global $insideitem, $tag, $title, $description, $link; 
 if ($insideitem) { 
 switch ($tag) { 
  case 'TITLE': 
  $title .= $data; 
  break; 
  case 'DESCRIPTION': 
  $description .= $data; 
  break; 
  case 'LINK': 
  $link .= $data; 
  break; 
 } 
 } 
} 

$xml_parser = xml_parser_create(); 
xml_set_element_handler($xml_parser, 'startElement', 'endElement'); 
xml_set_character_data_handler($xml_parser, "characterData"); 
$fp = fopen('http://news.google.es/news?cf=all&amp;ned=es_ve&amp;hl=es&amp;output=rss','r') 
or die('Error reading RSS data.'); 
while ($data = fread($fp, 4096)) { 
 xml_parse($xml_parser, $data, feof($fp)) 
  or die(sprintf('XML error: %s at line %d', 
 xml_error_string(xml_get_error_code($xml_parser)), 
 xml_get_current_line_number($xml_parser))); 
} 
//echo $acumula;
fclose($fp); 
xml_parser_free($xml_parser); 
echo($acumula); // THIS IS $RESULT!
?>
+3  A: 

EDIT

Since you're already using the XML parser, you're guaranteed the encoding is UTF-8.

If your page is encoded in ISO-8859-1, or even ASCII, you can do this to convert:

$result = mb_convert_encoding($result, "HTML-ENTITIES", "UTF-8");

Use a library that handles this for you, e.g. the DOM extension or SimpleXML. Example:

$d = new DOMDocument();
$d->load('http://news.google.es/news?cf=all&amp;ned=es_ve&amp;hl=es&amp;output=rss');
//now all the data you get will be encoded in UTF-8

Example with SimpleXML:

$url = 'http://news.google.es/news?cf=all&amp;ned=es_ve&amp;hl=es&amp;output=rss';
if ($sxml = simplexml_load_file($url)) {
    echo htmlspecialchars($sxml->channel->title); //UTF-8
}
Artefacto
how can i make this new code compatible with my old one? Check the question, i edited.
DomingoSL
@Dom The code you posted doesn't show how you obtain `$result`.
Artefacto
i will update with all the code
DomingoSL
with UTF-8 im still getting encoded chars
DomingoSL
@Dom read my edit.
Artefacto
yes, i implemented with the code you posted but i still getting enconded chars, here is the code: http://maracay.dyndns.org/contar/code.txt and here is the page http://maracay.dyndns.org/contar/index.php
DomingoSL
@Dom Works here partially with encoding ISO-8859-1: http://codepad.viper-7.com/0HCLc3 If you don't do the conversion, there are many more wrong cases (see this http://codepad.viper-7.com/CKIaPD). It seems the problem is that the RSS feed is itself corrupted (has mixed encodings). Not much you can do about that.
Artefacto
ok! thanks!!!!!
DomingoSL
A: 

You can use DOMDocument from PHP to strip HTML encoding tags. And use encoding conversion functions also from PHP to change encoding of this sting.

Svisstack
example please.
DomingoSL
@DomingoSL: You can use simplest http://php.net/manual/en/function.strip-tags.php to strip tags;-)
Svisstack