tags:

views:

311

answers:

1

I have a dynamic XML document which represents a tree structure of categories, but does so using path separated attributes in arbitrary order - like this:

   <data>    
      <record ID="24" Name="category 1\sub category 1"/>   
      <record ID="26" Name="category 1"/>     
      <record ID="25" Name="category 1\sub category 1\sub category 2"/>    
      <record ID="27" Name="category 1\sub category 1\sub category 3"/>    
      ...
   </data>

I need to come up with a solution that 'normalizes' the XML so that it is transformed into something like this:

   <data>    
      <record ID="26" Name="category 1">    
         <record ID="24" Name="sub category 1">    
            <record ID="25" Name="sub category 2"/>
            <record ID="27" Name="sub category 3"/>    
         </record>
      </record>   
      ...
   </data>

Basically I was wondering if this is something XSLT might be able to address, and how, rather than having to do it programmatically.

+15  A: 

Sure, no problem:

<xsl:stylesheet 
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>

  <xsl:output indent="yes" />

  <xsl:template match="/data">
    <!-- copy the document element -->
    <xsl:copy>
      <!-- That's where we start: all "record" nodes that have no "\". -->
      <xsl:apply-templates mode="recurse" select="/data/record[
        not(contains(@Name, '\'))
      ]" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="record" mode="recurse">
    <xsl:param name="starting-path" select="''" />

    <!-- The record node and its ID attribute can be copied. --> 
    <xsl:copy>
      <xsl:copy-of select="@ID" />

      <!-- Create the new "name" attribute. -->
      <xsl:attribute name="Name">
        <xsl:value-of select="substring-after(@Name, $starting-path)" />
      </xsl:attribute>

      <!-- Append a backslash to the current path. -->
      <xsl:variable name="current-path" select="concat(@Name, '\')" />

      <!-- Select all "record" nodes that are one level deeper... -->
      <xsl:variable name="children" select="/data/record[
        starts-with(@Name, $current-path)
        and
        not(contains(substring-after(@Name, $current-path), '\'))
      ]" />

      <!-- ...and apply this template to them. -->
      <xsl:apply-templates mode="recurse" select="$children">
        <xsl:with-param name="starting-path" select="$current-path" />
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Output on my system:

<data>
  <record ID="26" Name="category 1">
    <record ID="24" Name="sub category 1">
      <record ID="25" Name="sub category 2"></record>
      <record ID="27" Name="sub category 3"></record>
    </record>
  </record>
</data>

Note that the whole solution is based on the assumption that all the paths are canonical and do not contain trailing backslashes.

Also note that any unmatched/orphaned "record" elements will not be in the output (unless they are at the root level, of course).

One more thing: The template mode ("recurse") is not strictly necessary. I included it because the template is doing something rather special, and there might be the chance that there is another template in your solution that matches "record" nodes. In this case this solution can be dropped in without breaking anything else. For a standalone solution, the template modes can be dropped anytime.

Oh, and the last thing: If you want the result document to be ordered by Name, include an <xsl:sort> element with the <xsl:apply-templates> (both occurrences), like so:

<xsl:apply-templates select="...">
  <xsl:sort select="@Name" data-type="text" order="ascending" />
</xsl:apply-templates>
Tomalak
It works perfectly. What an amazing response, thank you so much!
mysomic
You are welcome. :-)
Tomalak
Man, you just rock!
Cerebrus