ansaurus

Question

Pulling out a full node with child nodes using XPath

Answer 1

+2 A:

Your XPath is fine, though you can remove the final /. as that's redundant:

/atom/content

All of the HTML is inside of a <![CDATA ]]> section so in the XML DOM you actually only have text there. The <i> and <b> tags will not be parsed as tags but will just show up as text. Using a CDATA section is exactly the same as if your XML were written like this:

<atom>
    <content>
      At first glance you may ask, &amp;#8220;what &lt;i&gt;exactly&lt;/i&gt;
      do you mean?&amp;#8221; It means that we want to help &lt;b&gt;you&lt;/b&gt; figure...
    </content>
</atom>

So, it's whatever you're doing with the <content> element afterwards that's dropping those tags. Are you later parsing the text as HTML, or running it through a filter, or something like that?

John Kugelman 2009-10-14 14:15:13

Removed the trailing period... however the question has changed somewhat.

null 2009-10-14 18:10:04

I don't think XPath is the problem, so can you post your PHP code?

John Kugelman 2009-10-14 19:57:49

Answer 2

A:

I don't know if SimpleXML is different but to me it seems you need to make sure you're selecting all node types and not just text. In standard XPath you would do /body/div/node()

ChrisCM 2009-10-14 19:08:38

Answer 3

+1 A:

SimpleXML doesn't like text nodes so you'll have to use a custom solution instead.

You can use asXML() on each div element then remove the div tags, or you can convert the div elements to DOMNodes then loop over $div->childNodes and serialize each child. Note that your HTML entities will most likely be replaced by the actual characters if available.

Alternatively, you can take a look at the SimpleDOM project and use its innerHTML() method.

$html = 
'<body>
    <div>
      At first glance you may ask, &#8220;what <i>exactly</i>
      do you mean?&#8221; It means that we want to help <b>you</b> figure...
    </div>
</body>';

$body = simpledom_load_string($html);

foreach ($body->xpath('/body/div') as $div)
{
    var_dump($div->innerHTML());
}

Josh Davis 2009-11-12 16:06:09

ansaurus

tags:

views:

answers:

Pulling out a full node with child nodes using XPath

related questions