views:

74

answers:

3

I have following html:

<html ><body >Body text <div >div content</div></body></html>

How could I get content of body without nested <div>? I need to get 'Body text', but do not have a clue how to do this.

result of running

$domhtml = DOMDocument::loadHTML($html);
print $domhtml->getElementsByTagName('body')->item(0)->nodeValue;

is 'Body textdiv content', which is not exactly what I want to get

A: 
$domhtml = DOMDocument::loadHTML($html);
print $domhtml->getElementsByTagName('body')->item(0)->textContent;
mcandre
did you try it? i'm afraid it does not work as expected. As I recall, DOMNode does not have innerHTML property
altern
+1  A: 

I prefer DOMXPath for problems like this. It's very flexible

$domhtml = DOMDocument::loadHTML($html); 
$xpath = new DOMXPath($domhtml);
$query="/html/body/text()"; //gets all text nodes that are direct children of body

$txtnodes = $xpath->query($query);

foreach ($txtnodes as $txt) {
    echo $txt->nodeValue;
}
dnagirl
A: 

Based on the comments from php.net This should work for you:

$domhtml = DOMDocument::loadHTML($html); 
print $domhtml->getElementsByTagName('body')->firstChild->nodeValue;
John