views:

179

answers:

2

I am Parsing a HTML document using DOMDocument Class in PHP, i wanted to get the nodeValue of a div element, but it is giving me null,

<div id="summary">
   Hi, my name is <span>ABC</span>
   <br/> 
   address is here at stackoverflow...
   <span>....
   ....
</div>

want to get the value inside the div, and the code i wrote wass

$div_node=$dom->getElementById("summary");
$node_value=$div_node->nodeValue;

but it is giving me a null value, please help.

A: 

The DOMDocument class requires valid HTML, your div tag isn't closed. When it returns null it means it can't find the element.

Obsidian
+1  A: 

The id is not registered in the document so cannot be queried. One option is to go through the HTML explicitly declaring which attribute of each element is its id, another option is to parse the document against a DTD and a third is to drop trying to use getElementById and use XPath instead.

For the latter you would use something like (the key point being the XPath query):

$xpath     = new DOMXPath($dom);
$summaries = $xpath->query('//div[@id="summary"]');
$summary   = 'unknown';
if ($summaries->length > 0) {
    $summary = $summaries->item(0)->nodeValue;
}
salathe