tags:

views:

212

answers:

3

I got embroiled in a discussion about DOM implementation quirks yesterday, with gave rise to an interesting question regarding Text.splitText and Element.normalise behaviours, and how they should behave.

In DOM Level 1 Core, Text.splitText is defined as...

Breaks this Text node into two Text nodes at the specified offset, keeping both in the tree as siblings. This node then only contains all the content up to the offset point. And a new Text node, which is inserted as the next sibling of this node, contains all the content at and after the offset point.

Normalise is...

Puts all Text nodes in the full depth of the sub-tree underneath this Element into a "normal" form where only markup (e.g., tags, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are no adjacent Text nodes. This can be used to ensure that the DOM view of a document is the same as if it were saved and re-loaded, and is useful when operations (such as XPointer lookups) that depend on a particular document tree structure are to be used.

So, if I take a text node containing "Hello World", referenced in textNode, and do

textNode.splitText(3)

textNode now has the content "Hello", and a new sibling containing " World"

If I then

textNode.parent.normalize()

what is textNode? The specification doesn't make it clear that textNode has to still be a child of it's previous parent, just updated to contain all adjacent text nodes (which are then removed). It seems to be to be a conforment behaviour to remove all the adjacent text nodes, and then recreate a new node with the concatenation of the values, leaving textNode pointing to something that is no longer part of the tree. Or, we can update textNode in the same fashion as in splitText, so it retains it's tree position, and gets a new value.

The choice of behaviour is really quite different, and I can't find a clarification on which is correct, or if this is simply an oversight in the specification (it doesn't seem to be clarified in levels 2 or 3). Can any DOM/XML gurus out there shed some light?

+1  A: 

While it would seem like a reasonable assumption, I agree that it is not explicityly made clear in the specification. All I can add is that the way I read it, one of either textNode or it's new sibling (i.e. return value from splitText) would contain the new joined value - the statement specifies that all nodes in the sub-tree are put in normal form, not that the sub-tree is normalised to a new structure. I guess the only safe thing is to keep a reference to the parent before normalising.

Sam Brightman
+1  A: 

I think all bets are off here; I certainly wouldn't depend on any given behaviour. The only safe thing to do is to get the node from its parent again.

Adrian Mouat
+1  A: 

I was on the DOM Working Group in the early days; I'm sure we meant for textNode to contain the new joined value, but if we didn't say it in the spec, it's possible that some implementation might create a new node instead of reusing textNode, though that would require more work for the implementors.

When in doubt, program defensively.

David Singer