tags:

views:

424

answers:

2

I'm new to PHP DOM object and have a problem I can't find a solution. I have a DOMDocument with following HTML:

<div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div>

I need to get all nodes that are on first level (header, content, footer). hasChildNodes() does not work, because first level node may not have children (header, footer). For now my code looks like:

$dom = new DOMDocument();
$dom -> preserveWhiteSpace = false;
$dom -> loadHTML($html);
$childs = $dom -> getElementsByTagName('div');

But this gets me all div's. any advice?

+2  A: 

You may have to go outside of DOMDocument - maybe convert to SimpleXML or DOMXpath

$file = $DOCUMENT_ROOT. "test.html";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);

$xpath = new DOMXpath($doc);
$elements = $xpath->query("/");
ChronoFish
Thanks, that helped.
Deniss Kozlovs
A: 

Here's how I grab the first level elements (in this case, the top level TD elements in a table row:

$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadHTML( $tr_element );

$xpath = new DOMXPath( $doc );
$td = $xpath->query("//tr/td[1]")->item(0);

do{
   if( $innerHTML = self::DOMinnerHTML( $td ) )
     array_push( $arr, $innerHTML );
   $td = $td->nextSibling;
} while( $td != null );

$arr now contains the top TD elements, but not nested table TDs which you would get from

$dom->getElementsByTagName( 'td' );

The DOMinnerHTML function is something I snagged somewhere to get the innerHTML of an element/node:

public static function DOMinnerHTML( $element, $deep=true ) 
{ 
  $innerHTML = ""; 
  $children = $element->childNodes; 
  foreach ($children as $child) 
  { 
    $tmp_dom = new DOMDocument(); 
    $tmp_dom->appendChild( $tmp_dom->importNode( $child, $deep ) ); 
    $innerHTML.=trim($tmp_dom->saveHTML()); 
  } 
  return $innerHTML; 
}
Michael Reed