tags:

views:

242

answers:

4

I've just started tinkering with XML manipulation with PHP, and i've stumbled into something unexpected. Here's the XML i'm using as a test input:

<list>
    <activity1> running </activity1>
    <activity2> swimming </activity2>
    <activity3> soccer </activity3>
</list>

Now, i was expecting that this PHP code would output 'activity1':

$xmldoc = new DOMDocument();
$xmldoc->load('file.xml');

//the line below would make $root the <list> node
$root = $xmldoc->firstChild;

//the line below would make $cnode the first child 
//of the <list> node, which is <activity1>
$cnode = $root->firstChild;

//this should output 'activity1'
echo 'element name: ' . $cnode->nodeName;

Instead, this code outputs #text. I could fix that by inserting a new line in the code, before printing the node name:

$cnode = $cnode->nextSibling;

Now, i would have expected that to print 'activity2' instead, but is printing 'activity1'. What is going on?

+1  A: 

The first node is the text (in this case whitespace) between the opening list tag and activity1 tag, the next node is the activity1 element. elements are not the same as nodes.

Draemon
i think i understand whats going on now. thanks
+1  A: 

To get the behaviour you expected, you need to pass in LIBXML_NOBLANKS as the second parameter of your load() call

<?php
$xmldoc = new DOMDocument();
$xmldoc->load('file.xml', LIBXML_NOBLANKS);
?>
Czimi
A: 

A note on Czimi's answer: removing whitespace-only nodes will not prevent you from having to check the type of node (whether it is an element, a text node, a comment...). In general if you're interested in only selecting element nodes, you'll want to do something like this:

while($nodeInQuestion->nodeType != 1 && $nodeInQuestion->nextSibling) {
    $nodeInQuestion = $nodeInQuestion->nextSibling;
}

This is sort of pseudo-code. Obviously you'll need to handle failure somehow if you're looking for an element and reach the end of the parentNode's childNodes before you find it.

eyelidlessness
+1  A: 

If you use XPath to query your document, you don't need to worry about this kind of arcana. Use DOMDocument::xpath_eval() to evaluate the pattern /list/* and all you'll get back are the child elements of the top-level list element no matter what.

Robert Rossney