views:

3029

answers:

5

Hi,

I have an existing XML document with some optional nodes and I want to insert a new node, but at a certain position.

The document looks something like this:

<root>
  <a>...</a>
  ...
  <r>...</r>
  <t>...</t>
  ...
  <z>...</z>
</root>

The new node (<s>...</s>) should be inserted between node <r> and <t>, resulting in:

<root>
  <a>...</a>
  ...
  <r>...</r>
  <s>new node</s>
  <t>...</t>
  ...
  <z>...</z>
</root>

The problem is, that the existing nodes are optional. Therefore, I can't use XPath to find node <r> and insert the new node after it.

I would like to avoid the "brute force method": Search from <r> up to <a> to find a node that exists.

I also want to preserve the order, since the XML document has to conform to a XML schema.

XSLT as well as normal XML libraries can be used, but since I'm only using Saxon-B, schema aware XSLT processing is not an option.

Does anyone have an idea on how to insert such a node?

thx, MyKey_

A: 

You must use a brute force search since you have no static path to find the insert location. My approach would be to use a SAX parser and read the document. All nodes are copied to the output unmodified.

You'll need a flag sWasWritten which is why you can't use a normal XSLT tool; you need one where you can modify variables.

As soon as I see a node > r (t, u, ..., z) or the end-tag of the root node, I'd write the s node unless sWasWritten was true and set the flag sWasWritten.

Aaron Digulla
SAX processing will work as you suggest. But XSLT is quite capable for the task as well (see my answer).
Evan Lenz
A: 

The XML Spec specifies that elements are not in a specific (sorted) order. By using an XSD you can set rules about which elements need to be where in the XML tree in which amount, but not in which order.

<set>
  <a /><b />
</set>

or

<set>
  <b /><a />
</set>

result in the same XML DOM.

The order of elements should not matter since XML is meant for data transferring, not sorting information (for that you can use things like XSL). Maybe you can search the answer in an easier approach by letting go of the order in which the elements are placed in your XML. (or go with the answer of Aaron Digulla by sax parsing, mtfbwy)

Mark
This answer is completely untrue. Order matters for elements (by default), never for attributes. XML is for documents too, not just databases. The poster refers to the XML and XSD specs but has clearly never read them and has been misled somewhere along the way.
Evan Lenz
Agree with Evan - an exception is if the XML Schema uses "<all>" - but it sounds like that's not the case for your XML Schema, and as I understand it, you can't modify it to use <all> (which would be another solution.)
13ren
Indeed, an interesting case of an answer accepted because the original poster did not know the correct one...
bortzmeyer
+4  A: 

[Replaced my last answer. Now I understand better what you need.]

Here's an XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  <xsl:template match="/root">
    <xsl:variable name="elements-after" select="t|u|v|w|x|y|z"/>
    <xsl:copy>
      <xsl:copy-of select="* except $elements-after"/>
      <s>new node</s>
      <xsl:copy-of select="$elements-after"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

You have to explicitly list either the elements that come after or the elements that come before. (You don't have to list both.) I would tend to choose the shorter of the two lists (hence "t" - "z" in the above example, instead of "a" - "r").

OPTIONAL ENHANCEMENT:

This gets the job done, but now you need to maintain the list of element names in two different places (in the XSLT and in the schema). If it changes much, then they might get out of sync. If you add a new element to the schema but forget to add it to the XSLT, then it won't get copied through. If you're worried about this, you can implement your own sort of schema awareness. Let's say your schema looks like this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;

  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="a" type="xs:string"/>
        <xs:element name="r" type="xs:string"/>
        <xs:element name="s" type="xs:string"/>
        <xs:element name="t" type="xs:string"/>
        <xs:element name="z" type="xs:string"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

Now all you need to do is change your definition of the $elements-after variable:

  <xsl:variable name="elements-after" as="element()*">
    <xsl:variable name="root-decl" select="document('root.xsd')/*/xs:element[@name eq 'root']"/>
    <xsl:variable name="child-decls" select="$root-decl/xs:complexType/xs:sequence/xs:element"/>
    <xsl:variable name="decls-after" select="$child-decls[preceding-sibling::xs:element[@name eq 's']]"/>
    <xsl:sequence select="*[local-name() = $decls-after/@name]"/>
  </xsl:variable>

This is obviously more complicated, but now you don't have to list any elements (other than "s") in your code. The script's behavior will automatically update whenever you change the schema (in particular, if you were to add new elements). Whether this is overkill or not depends on your project. I offer it simply as an optional add-on. :-)

Evan Lenz
This doesn't work when there is no 'r' node (as per the original question: All nodes are optional). How would the template look when you can't rely on any node to exist?
Aaron Digulla
Oops, you're right. I had mis-read the original post. Now I've completely replaced the answer. Thanks.
Evan Lenz
That's really cool. Slight refinement: in deriving $elments-after, use a variable instead of 's', so you can automatically handle inserting after any child of <root>.
13ren
Agreed. The "s" is pretty hidden among all those implementation details.
Evan Lenz
Wow, nice solution. I would never have thought of parsing the schema. Thanks a lot.
MyKey_
@Evan Lenz is it possible to do this somehow from java using this schema?
London
A: 

An XPath solution:

/root/(.|a|r)[position()=last()]

You must explicitly include all the nodes up to the one you want, so that you'll need a different XPath expression for each node you want to insert after. For example, to place it immediately after <t> (if it exists):

/root/(.|a|r|t)[position()=last()]

Note the special case of when none of the preceding nodes are present: it returns <root> (the "."). You'll need to check for this, and insert the new node as the first child of root, instead of after it (the usual case). This isn't so bad: you'd have to handle this special case in some way, anyway. Another way to handle this special case is the following, which returns 0 nodes if there are no preceding nodes.

/root/(.|a|r|t)[position()=last() and position()!=1]

Challenge: can you find a better way to handle this special case?

13ren
A: 

Thanks for the explanations. Could anybody tell me how to insert some XML between two grand child nodes. For example I want to insert ... between ... and ... in this file ... ... ... ...

Sandra