views:

49

answers:

1

Hi, I'm having a problem with encoding from a wordpress feed that I just can't seem to figure out.

I was loading my feed with DOMDocument->load but then did a file_get_contents and am now using ->XMLload with the same results. I did the XMLload so I could manipulate the feed if needed.

The correct output that I'm looking for is - ‘ £. If I just echo from a Xpath query, I get - ‘ £. If I echo with utf8_decode I get - ? £. A lot better but the question mark should be an apostrophe.

If I loop through each node of the DomDocument when it is loaded, I get the correct output. So it seems that it's being handled incorrectly in XPath.

Any thought?

The feed is http://shredeasy.com/blog/category/news/feed

Here is the function that is being called:

function getPostsInCategory($feed=NULL){
    if(is_null($feed)){ echo "Wrong Usage. Need a valid Category Feed.  Most likely from getCategories()."; return false; }
    $feedx = file_get_contents($feed);
    $xml = new DOMDocument();
    $xml->loadXML($feedx);
    //$this->showDOMNode($xml);


    //$xml->load($feed);
    $xpath = new DomXPath($xml);
    $xpath->registerNamespace("content", "http://web.resource.org/rss/1.0/modules/content/");

    $cat = array();
    foreach($xml->getElementsByTagName('item') as $c){
        $elements = array();
        $elements["title"] = $xpath->query("title", $c)->item(0)->nodeValue;
        echo utf8_decode($elements["title"]);

I have been trying to figure this out for hours and I keep circling back to the wrong thing.

Thanks for the help!

You know right, it seems to be that apostrophes are turning into question marks....Gosh! I don't know if that's the only issue or not.

+1  A: 

The string being echoed is encoded in UTF-8.

  • If your page was encoded in UTF-8, you can just echo it, possibly calling htmlspecialchars with the third argument set to "UTF-8".
  • Otherwise, you have to convert it before to whatever encoding your webpage is using. See iconv and mb_convert_encoding.
Artefacto
I meant to reply to this, but htmlspecialchars with UTF-8 as an argument was the answer.
Senica Gonzalez