views:

722

answers:

1

I have XML that looks something like this:

<Root xmlns="http://widgetspecA.com/ns"&gt;
  ...any...
  <WidgetBox>
    <A/>
    <B/>
    <SmallWidget> <!-- minOccurs='0' -->
      ...any...
    </SmallWidget>
    <Widgets> <!-- minOccurs='0' -->
      ...any...
    </Widgets>
    ...any...
  </WidgetBox>
  ...any...
</Root>

and I want to transform it into this:

<Root xmlns="http://widgetspecB/ns"&gt;
  ...any...
  <WidgetBox>
    <A/>
    <B/>
    <Widgets>
      <Atom>
        ...any...
      </Atom>
      <Molecule>
        ...any...
      </Molecule>
    </Widgets>
    ...any...
  </WidgetBox>
  ...any...
</Root>

In other words:

<SmallWidget> in specA means the same thing as <Atom> in specB, so just rename the element.

<Widgets> in specA means the same thing as <Molecule> in specB, so just rename the element.

Wrap <Atom> and <Molecule> in an element named <Widgets>, which means something different from specA's <Widgets>.

Everything else gets copied as is, but in the new namespace.

What would the XSLT be for this?

SOLUTION?: In the end I went with this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:old="http://widgetspecA.com/ns"
    xmlns="http://widgetspecB.com/ns"
    exclude-result-prefixes="old">

  <xsl:output method="xml"/>

  <xsl:template match="*">
    <xsl:element name="{name()}">
      <xsl:copy-of select="@*"/>
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="old:SmallWidget" mode="single">
    <Atom>
      <xsl:apply-templates/>
    </Atom>
  </xsl:template>

  <xsl:template match="old:Widgets" mode="single">
      <Molecule>
        <xsl:apply-templates/>
      </Molecule>
  </xsl:template>

  <xsl:template match="old:SmallWidget[following-sibling::old:Widgets]">
      <Widgets>
       <xsl:apply-templates select="self::node()" mode="single"/>
       <xsl:apply-templates select="following-sibling::old:Widgets" mode="single"/>
      </Widgets>
  </xsl:template>

  <xsl:template match="old:Widgets[preceding-sibling::old:SmallWidget]"/>

  <xsl:template match="old:SmallWidget[not(following-sibling::old:Widgets)]">
      <Widgets>
       <xsl:apply-templates select="self::node()" mode="single"/>
      </Widgets>
  </xsl:template>

  <xsl:template match="old:Widgets[not(preceding-sibling::old:SmallWidget)]">
      <Widgets>
       <xsl:apply-templates select="self::node()" mode="single"/>
      </Widgets>
  </xsl:template>

</xsl:stylesheet>
+1  A: 

A good XSLT solution will map your human-readable rules to simple template rules. Here are the rules, in your words:

  1. <SmallWidget> in specA means the same thing as <Atom> in specB, so just rename the element.
  2. <Widgets> in specA means the same thing as <Molecule> in specB, so just rename the element.
  3. Wrap <Atom> and <Molecule> in an element named <Widgets>, which means something different from specA's <Widgets>.
  4. Everything else gets copied as is, but in the new namespace.

Let's give it a go:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:in="http://widgetspecA.com/ns"
  xmlns="http://widgetspecB.com/ns"
  exclude-result-prefixes="in">

  <!-- 1. Rename <SmallWidget> -->
  <xsl:template mode="rename" match="in:SmallWidget">Atom</xsl:template>

  <!-- 2. Rename <Widgets> -->
  <xsl:template mode="rename" match="in:Widgets">Molecule</xsl:template>

  <!-- 3. Wrap <Atom> & <Molecule> with <Widgets> -->
  <xsl:template match="in:SmallWidget">
    <!-- ASSUMPTION: in:Widgets immediately follows in:SmallWidget -->
    <Widgets>
      <xsl:apply-templates mode="convert" select="."/>
      <xsl:apply-templates mode="convert" select="following-sibling::in:Widgets"/>
    </Widgets>
  </xsl:template>

          <!-- Skip by this in regular processing;
               it gets explicitly converted inside <Widgets> (see above) -->
          <xsl:template match="in:Widgets"/>

          <!-- Also, don't copy whitespace appearing
               immediately before in:Widgets -->
          <xsl:template match="text()
                               [following-sibling::node()[1][self::in:Widgets]]"/>


  <!-- 4: Everything copied as is, but in the new namespace -->

    <!-- Copy non-element nodes as is -->
    <xsl:template match="@* | text() | comment() | processing-instruction()">
      <xsl:copy/>
    </xsl:template>

    <!-- By default, just convert elements to new namespace
         (exceptions under #3 above) -->
    <xsl:template match="*">
      <xsl:apply-templates mode="convert" select="."/>
    </xsl:template>

            <xsl:template mode="convert" match="*">
              <!-- Optionally rename the element -->
              <xsl:variable name="name">
                <xsl:apply-templates mode="rename" select="."/>
              </xsl:variable>
              <xsl:element name="{$name}">
                <xsl:apply-templates select="@* | node()"/>
              </xsl:element>
            </xsl:template>

                    <!-- By default, just use the same local
                         name as in the input document -->
                    <xsl:template mode="rename" match="*">
                      <xsl:value-of select="local-name()"/>
                    </xsl:template>

</xsl:stylesheet>

Note that it's important that you use the local-name() function and not the name() function. If you use name(), your stylesheet will break if your input document starts using a namespace prefix that isn't explicitly declared in your stylesheet (unless you add the namespace attribute to <xsl:element> to enforce the namespace even when a prefix appears). However, if we use local-name(), we're safe; it won't ever include the prefix, so the result element will adopt our stylesheet's default namespace.

Running the above stylesheet against your sample input document yields exactly what you requested:

<Root xmlns="http://widgetspecB.com/ns"&gt;...any...&lt;WidgetBox&gt;...any...
  <Widgets><Atom>
    ...any...
  </Atom><Molecule>
    ...any...
  </Molecule></Widgets>...any...
</WidgetBox>...any...</Root>

Let me know if you have any questions. Ain't XSLT powerful!

P.S. If I wanted to be really precise on replicating the whitespace as in your example, I could have used step-wise, "chain" processing, where I apply templates to just one node at a time and each template rule is responsible for continuing processing onto its following sibling. But that seemed like overkill for this situation.

UPDATE: The new solution you posted is very reasonable. It can be simplified some though. I've taken your new solution and made some recommended changes below, along with comments indicating what I changed and why I made those changes.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:old="http://widgetspecA.com/ns"
    xmlns="http://widgetspecB.com/ns"
    exclude-result-prefixes="old">

  <!-- "xml" is the default; no real need for this
  <xsl:output method="xml"/>
  -->

  <!-- This works fine if you only want to copy elements, attributes,
       and text. Just be aware that comments and PIs will get
       effectively stripped out, because the default template rule
       for those is to do nothing.
  -->
  <xsl:template match="*">
    <xsl:element name="{name()}">
      <xsl:copy-of select="@*"/>
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="old:SmallWidget" mode="single">
    <Atom>
      <xsl:apply-templates/>
    </Atom>
  </xsl:template>

  <xsl:template match="old:Widgets" mode="single">
      <Molecule>
        <xsl:apply-templates/>
      </Molecule>
  </xsl:template>

  <!-- You actually only need one rule for <old:SmallWidget>.
       Why? Because the behavior of this rule will always
       be exactly the same as the behavior of the other rule
       you supplied below.
  -->
  <xsl:template match="old:SmallWidget"> <!--[following-sibling::old:Widgets]">-->
      <Widgets>
                      <!-- "." means exactly the same thing as "self::node()" -->
       <xsl:apply-templates select="." mode="single"/>

       <!-- If the node-set is empty, then this will be a no-op anyway,
            so it's safe to have it here even for the case when
            <old:Widgets> is not present in the source tree. -->
                                    <!-- This XPath expression ensures
                                         that you only process the next
                                         sibling element - and then only
                                         if it's name is <old:Widgets>.
                                         Your schema might not allow it,
                                         but this is a clearer communication
                                         of your intention, and it will also
                                         work correctly if another
                                         old:SmallWidget/old:Widget pair
                                         appeared later in the document.
                                    -->
       <xsl:apply-templates select="following-sibling::*[1][self::old:Widgets]"
                            mode="single"/>
      </Widgets>
  </xsl:template>

                                  <!-- updated this predicate for the
                                       same reason as above. Answers the
                                       question: Is the element right before
                                       this one a SmallWidget? (as opposed to:
                                       Are there any SmallWidget elements
                                       before this one?) -->
  <xsl:template match="old:Widgets[preceding-sibling::*[1][self::old:SmallWidget]]"/>

  <!-- Removed, because this rule effectively has the same behavior as the other one above
  <xsl:template match="old:SmallWidget[not(following-sibling::old:Widgets)]">
      <Widgets>
       <xsl:apply-templates select="self::node()" mode="single"/>
      </Widgets>
  </xsl:template>
  -->

  <!-- no need for the predicate. The format of this pattern (just a name)
       causes this template rule's priority to be 0. Your other rule
       for <old:Widgets> above has priority of .5, which means that it
       will override this one automatically. You don't need to repeat
       the constraint. Alternatively, you could keep this predicate
       and remove the other one. Either way it will work. (It's probably
       a good idea to place these rules next to each other though,
       so you can read it like an if/else statement) -->
  <xsl:template match="old:Widgets">  <!--[not(preceding-sibling::*[1][self::old:SmallWidget])]">-->
      <Widgets>
       <xsl:apply-templates select="." mode="single"/>
      </Widgets>
  </xsl:template>

</xsl:stylesheet>
Evan Lenz
What if both SmallWidget and Widgets are optional elements in the source schema?
Wayne
I see that your final solution handles this scenario already. I've appended your code to my edited answer and made changes to help you simplify it (and possibly make it more robust).
Evan Lenz
Yes, that's much simpler. Thanks for helping me scrape some of the rust off my XSL skills...
Wayne