tags:

views:

622

answers:

3

I'm about to delete certain elements in an XML document, using code like the following:

NodeList nodes = ...;
for (int i = 0; i < nodes.getLength(); i++) {
  Element e = (Element)nodes.item(i);
  if (certain criteria involving Element e) {
    e.getParentNode().removeChild(e);
  }
}

Will this interfere with proper traversal of the NodeList? Any other caveats with this approach? If this is totally wrong, what's the proper way to do it?

+2  A: 

According to the DOM specificaion, the result of a call to node.getElementsByTagName("...") is supposed to be "live", that is, any modification made to the DOM tree will be reflected in the NodeList object. Well, for conforming implementations, that is...

NodeList and NamedNodeMap objects in the DOM are live; that is, changes to the underlying document structure are reflected in all relevant NodeList and NamedNodeMap objects.

(DOM Specification)

So, when you modify the tree structure, a conforming implementation will change the NodeList to reflect these changes.

Dirk
So that would mean my indexes become invalid during the traversal, right?
skiphoppy
A: 

So, given that removing nodes while traversing the NodeList will cause the NodeList to be updated to reflect the new reality, I assume that my indices will become invalid and this will not work.

So, it seems the solution is to keep track of the elements to delete during the traversal, and delete them all afterward, once the NodeList is no longer used.

NodeList nodes = ...;
Set<Element> targetElements = new HashSet<Element>();
for (int i = 0; i < nodes.getLength(); i++) {
  Element e = (Element)nodes.item(i);
  if (certain criteria involving Element e) {
    targetElements.add(e);
  }
}
for (Element e: targetElements) {
  e.getParentNode().removeChild(e);
}
skiphoppy
Why do you feel the need to do this? Do your criteria depend on the element's siblings? If yes (in other words, if-and-only-if you need to preserve the siblings) then keep a List (there's no need for Set, there aren't going to be duplicates).
kdgregory
Criteria doesn't depend on the siblings, but if I understand the answer above, if I delete node 5 out of 7, suddenly I'll only have 6 nodes in my NodeList, and my for loop will have wrong indices, skipping one Node and then advancing past the end of the list. Please correct me if I'm misunderstanding.
skiphoppy
Why do you say "No need for a Set"? Isn't it the other way around? There's no need for a List, since order doesn't matter, so I just used a Set.
skiphoppy
In XML, order matters.
kdgregory
What this means is that the NodeList will return nodes in order. Removing a node from the middle of that list will shift the indices, but only for nodes that you've not yet visited. And since your loop uses getLength(), you're never going to index off the end of the list.
kdgregory
As for using a List instead of a Set (which is a moot point). Since order matters, you want to preserve that order. In this case, maybe it's not so important, but it almost every other case it will be. Plus, list implementations tend to be more efficient than set implementations.
kdgregory
The order in which I remove the nodes doesn't matter, does it?
skiphoppy
Okay, suppose out of 5 nodes, 0-4 I remove node 2. Now my next iteration will skip the former node 3 (now node 2), assuming I remove nodes during the loop. So don't I have to put them all in a collection (or something) and remove them outside of the loop? I don't understand the points you've been trying to make. So therefore, I'm fearful that I'm missing something. :)
skiphoppy
OK, I see what you're saying now. Count backwards.
kdgregory
Ah, I see! So I just need to change the for loop to for (int i = nodes.getLength() - 1; i >= 0; i--) , and then I won't need the collection? Makes perfect sense. Post it, re-explaining why the original doesn't work, and I'll mark you as the accepted answer. :)
skiphoppy
+1  A: 

The Practical XML library now contains NodeListIterator, which wraps a NodeList and provides full Iterator support (this seemed like a better choice than posting the code that we discussed in the comments). If you don't want to use the full library, feel free to copy that one class: http://practicalxml.svn.sourceforge.net/viewvc/practicalxml/trunk/src/main/java/net/sf/practicalxml/util/NodeListIterator.java?revision=125&amp;view=markup

kdgregory