views:

71

answers:

3

Hi, I have a function that accepts a general HTML file and a general XPath expression. I want to extract a string of the matched node containing the entire text including HTML tags. Here's a simplified example...

<?php
$inDocStg = "
    <html><body>
    <div>The best-laid<br> schemes o' <span>mice</span> an' men
        <img src='./mouse.gif'><br>
    </div>
    </body></html>
    ";

$xPathDom = new DOMDocument();
@$xPathDom->loadHTML( $inDocStg );
$xPath = new DOMXPath( $xPathDom );
$matches = $xPath->query( "//div" );
echo $matches->item(0)->nodeValue;
?>

This produces (I'm looking at the generated HTML source - not the browser output)...

The best-laid schemes o' mice an' men

(the HTML tags have been stripped out).

But what I want is...

The best-laid<br> schemes o' <span>mice</span> an' men<img src='./mouse.gif'><br>

Thanks.

A: 

How about you wrap you output arround <pre> tags
echo "<pre>" . $matches->item(0)->nodeValue . "</pre>";

c0mrade
Hi C0mrade. That produces: <pre>The best-laid schemes o' mice an' men</pre> Its the text contained in the string that I'm interested in not how it displays on a browser (I'm just echoing it to the browser to see what it has done).
spiderPlant0
A: 

try giving these 2 a go!

1

echo $matches->item(0)->textContent;

2

echo $matches->item(0);

The first one returns the text content of this node and its descendants, and the second one is trying to access the magic method __toString().. depending on how DOMDocument is built it could be the value that your already getting.

RobertPitt
Hi, textContent gives the same result as nodeValue. The second suggestion produces an error.
spiderPlant0
__toString() isn't defined for DOMNode.
GZipp
A: 

This will work but without XPath;

$xPathDom = new DOMDocument();
$xPathDom->loadHTML( $inDocStg );
echo $xPathDom->saveXML($xPathDom->getElementsByTagName('div')->item(0));

or

$xPathDom = new DOMDocument();
$xPathDom->loadHTML( $inDocStg );
$xPathDom->getElementsByTagName('div')->item(0);
echo $xPathDom->saveHTML();
Centurion
`DOMXPath::query` returns `DOMNodeList` so it should work fine as your passing in the same entity type as `X->item(0)` - `$xPath->query( "//div" )->item(0)` returns the same as `$X->getElementsByTagName('div')->item(0)`
RobertPitt
This doesnt work. I pasted in the code above (the bit after the '... Xpath would be') And it just printed out the entire HTML document not the node contents selected by xpath. Did you get it to work? I also tried saveHTML() on the $xPath->query( "//div" )->item(0) and this produced an error.
spiderPlant0
@spiderPlant0. Yes the edit wasn't working the way you need , I deleted and left those without XPath, because it wasn't the answer that you are looking for.
Centurion
Also, I cant use getElementsByTagName('div') as I need a generalised xPath expression (my example is simplified).
spiderPlant0