tags:

views:

25

answers:

1

I have a feed with images in the description node. How can I parse out just the image URL and just the description text with no line break in between?

<description>&lt;img src='http://example.com/100915gignac-clement_g_70x70.jpg'&amp;gt;&amp;lt;/img&amp;gt;&amp;lt;br /&gt;(Source: Example.com) Québec annonce qu'une autorisation ministérielle sera nécessaire pour une prise de participation de plus de 30&amp;#160;% de la nouvelle société fusionnée Investissement Québec dans une entreprise.</description>

Thanks for any help!

+1  A: 

Pass the content of the decription node to another SimpleXmlElement.

$sxe  = new SimpleXmlElement("<description>&lt;img src='http://example.com/100915gignac-clement_g_70x70.jpg'&amp;gt;&amp;lt;/img&amp;gt;&amp;lt;br /&gt;(Source: Example.com) Québec annonce qu'une autorisation ministérielle sera nécessaire pour une prise de participation de plus de 30&amp;#160;% de la nouvelle société fusionnée Investissement Québec dans une entreprise.</description>");
$img  = new SimpleXMLElement("<root>$sxe</root>");
$desc = (string) $img;
$src  = (string) $img->img['src'];

var_dump($desc, $src);

For some reason, SimpleXML apparently html_decodes the entities by itself.

Gordon
No dice.........Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE9 0x62 0x65 0x63Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: 5gignac-clement_g_70x70.jpg'></img><br />(Source: Example.com) QuWarning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: ^Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML'
ace
@ace It works fine here. Did you add the root tags? They are needed. Also, if the entities are not decoded, run the detail node through html_entitiy_decode before making it a new SimpleXmlElement.
Gordon
Working now, had to run it through utf8_encode function for some reason. Many thanks!
ace