views:

60

answers:

2

I'm using PHP's DOMDocument and related classes to work with XML. The XML contains processing instructions that must be processed in a specific order, from top to bottom, deepest to shallowest.

I'm using a recursive function that takes the entire DOMDocument, then calls itself foreach child node, all the way down.

I need to delete some of these child nodes after I process them. The problem is that as soon as a child node is deleted, it seems the foreach is "out of whack" and I can't delete any more child nodes.

Example XML doc:

<root id="1">
    <instruction id="2">
        <inner id="3"/>
    </instruction>
    <instruction id="4">
        <inner id="5"/>
    </instruction>
</root>

I need to process inner[@id=3] first, then delete it. Then I need to process instruction[id=2] and delete it. Up until here, everything's fine.

Next I need to process inner[@id=5] and delete it. I can read/process it ok, but when I try to remove it:

$parentNode=$inner->parentNode;
$parentNode->removeChild($inner);

Nothing happens. It seems that removing the first instruction node has made PHP confused about which elements are contained in the document now.

I know it's possible to process the XML from bottom to top and remove all of the nodes in that order, but I have a specific need to go from top-to-bottom.

One other piece of info here - I'm also adding some new nodes to the document while processing it, in case that changes my options.

What do I need to do to get this to work?

A: 

I must be blind. Somehow I missed the discussion here about this exact problem: http://www.php.net/manual/en/domnode.removechild.php

Apologies for the unnecessary clutter.

Tex
A: 

A lot of the DOM structures are internally just arrays. Deleting elements will shift things around as the keys/pointers stored in returned NODElist objects will refer to array offsets which no longer exist, or contain something completely differnt.

To properly handle deleting DOM nodes without upsetting these various pointers, you have to work from the end of the array towards the beginning. Instead of deleting the nodes in-place, store them in a stack-type structure and after you've completed operations on any particular level of the tree, you can pop everything off the stack and do the deletions then. At this point any dangling pointers will be moot as you've completed your work already.

Marc B
You're absolutely right, Marc. I'm also considering putting a flag on processed elements so I can reload the entire document after I remove an element. Reloading would reset all of the arrays. The flag would allow me to skip over processed elements and sort of start where I left off.
Tex