views:

102

answers:

1

I am following the suggestion from this question Robust, Mature HTML Parser for PHP, about parsing html that may be malformed with DOMDocument.

Is there any easy way to loop over the parsed document? So I would like to loop over html like this.

$html='<ul>
         <li>value1</li>
         <li>value1</li>
         <li>value3
            <p>subvalue</p>
         </li>
        </ul>
        <p>hello world</p>';

$doc = new DOMDocument();
$doc->loadHTML($html);
???
foreach (??? as $node)
{
  print $node->nodeName.':'.$node->nodeValue;
}

And get results somewhat like this.

 ul:
 li:value1
 li:value2
 li:value3
 p:subvalue
 p:hello world

Using $doc->childNodes by itself doesn't really do what I want. Since it doesn't seem to go down to lower branches in the tree. I used the code suggested by halfdan and I get results like this.

html:
html:value1
         value1
         value3
            subvalue

        hello world
+1  A: 

Try this:

$doc = new DOMDocument();
$doc->loadHTML($html);
showDOMNode($doc);

function showDOMNode(DOMNode $domNode) {
    foreach ($domNode->childNodes as $node)
    {
        print $node->nodeName.':'.$node->nodeValue;
        if($node->hasChildNodes()) {
            showDOMNode($node);
        }
    }    
}

Cheers,
Fabian

halfdan
@halfdan, Thanks, I have updated my question to be more clear. I don't believe `$doc->childNodes` by itself does what I want. Basically I want to visit each node in the tree, not just see all nodes at one level.
Zoredache
Alright, gimme a sec and I'll update my post.
halfdan