tags:

views:

464

answers:

2

Hi

I wonder if somebody can help me flatten an XML file to CSV format. I have worked on this all day today, and although I've found some simple examples, my issue is slighlty different. Please see example below...

Example XML:

<data>
    <val_A>1</val_A>
    <val_A>2</val_A>
    <val_B>3</val_B>
    <val_B>4</val_B>
    <val_C>
        <val_D>5</val_D>
        <val_D>6</val_D>
    </val_C>
    <val_E>
        <val_F>7</val_F>
        <val_F>8</val_F>
    </val_E>
</data>

Expected Output:

val_A,val_B,val_C,val_D,val_E,val_F
1,3,,5,,7
1,3,,5,,8
1,3,,6,,7
1,3,,6,,8
1,4,,5,,7
1,4,,5,,8
1,4,,6,,7
1,4,,6,,8
2,3,,5,,7
2,3,,5,,8
2,3,,6,,7
2,3,,6,,8
2,4,,5,,7
2,4,,5,,8
2,4,,6,,7
2,4,,6,,8

Notice that the data node has several different structure of child node. Is it possible to write XSL to flatten this?

Second question is it possible to write the XSL in a generic way so that it doesn't need to be aware of node names and therefore just works for any XML fed in?

Any help would be greatly appreciated.

Thanks David

+1  A: 

It looks like you want to output all possible combination of node values for each node of a given name.

So, each of the nodes val_A, val_B, val_D and val_F have 2 occurences, whereas nodes val_C and val_E have 1 occurence. This would give 2 x 2 x 1 x 2 x 1 x 2 = 16 combinations.

I could only think of doing this using a two stage process in XSLT. Because you have stated VB.Net, I am assuming you will be able to use the Microsoft extension functions in XSLT, allowing you to create node-sets.

The first step would to be create a variable containing a node-set of unique node names in the XML. This variable (called nodeNames in the example below) would hold a list of nodes like so

<node>val_A</node>
<node>val_B</node>
<node>val_C</node>
....

I have used meunchian grouping to achieve this.

The next step would be to recursively call a named template, once for each possible node name in the XML (using the node-set created previously). A string is built up for each possible value of the nodes matching the name.

Here is the XSLT I used.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
   <xsl:output method="html"/>
   <xsl:key name="nodes" match="node()" use="local-name()"/>

   <!-- Variable holding unique node names -->
   <xsl:variable name="nodeNames">
      <xsl:apply-templates select="/*/*" mode="name"/>
   </xsl:variable>

   <!-- Use meunchian groupings to build up a node-set of unique node names -->
   <xsl:template match="*" mode="name">
      <xsl:if test="generate-id() = generate-id(key('nodes', local-name())[1])">
         <node>
            <xsl:value-of select="local-name()"/>
         </node>
      </xsl:if>
      <xsl:apply-templates select="*" mode="name"/>
   </xsl:template>

   <!-- Match the root to start processing -->
   <xsl:template match="/">
      <xsl:call-template name="recurse">
         <xsl:with-param name="nodeIndex">1</xsl:with-param>
      </xsl:call-template>
   </xsl:template>

   <!-- Recursive template to loop through nodes of a given name -->
   <xsl:template name="recurse">
      <xsl:param name="nodeIndex"/>
      <xsl:param name="outputString"/>

      <!-- Get the name of the node for the selected index -->
      <xsl:variable name="nodeName" select="msxsl:node-set($nodeNames)/node[number($nodeIndex)]/text()"/>

      <!-- Loop through all nodes with this name -->      
      <xsl:for-each select="key('nodes', $nodeName)">

         <!-- Build up the output string using the node value -->
         <xsl:variable name="newOutputString">
            <xsl:if test="number($nodeIndex) &gt; 1">
               <xsl:value-of select="$outputString"/>
               <xsl:text>,</xsl:text>
            </xsl:if>
            <xsl:value-of select="text()"/>
         </xsl:variable>

         <xsl:choose>
            <!-- If more nodes names are left to process, recursively call the template for the next one -->
            <xsl:when test="number($nodeIndex) &lt; count(msxsl:node-set($nodeNames)/node)">
               <xsl:call-template name="recurse">
                  <xsl:with-param name="nodeIndex">
                     <xsl:value-of select="number($nodeIndex) + 1"/>
                  </xsl:with-param>
                  <xsl:with-param name="outputString">
                     <xsl:value-of select="$newOutputString"/>
                  </xsl:with-param>
               </xsl:call-template>
            </xsl:when>
            <!-- Output the string that has been built up -->
            <xsl:otherwise>
               <xsl:value-of select="$newOutputString"/>
               <xsl:text>&#13;&#10;</xsl:text>
            </xsl:otherwise>
         </xsl:choose>
      </xsl:for-each>
   </xsl:template>

</xsl:stylesheet>

Because of the recursion involved, this would become increasingly ineffecient the greater number of distinct node names that there were.

Tim C
+1  A: 

Here is an XSLT 2.0 stylesheet that should do the job, at least when I run it with Saxon 9 or with AltovaXML tools against the input you posted I get the output you described. Here is the stylesheet:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xsd"
  version="2.0">

  <xsl:param name="lf" select="'&#10;'"/>
  <xsl:param name="sep" select="','"/>

  <xsl:output method="text"/>

  <xsl:variable name="groups" as="element(group)*">
    <xsl:for-each-group select="data/descendant::*" group-by="node-name(.)">
      <group name="{current-grouping-key()}">
        <xsl:choose>
          <xsl:when test="*">
            <value/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:for-each select="current-group()[not(*)]">
              <value><xsl:value-of select="."/></value>
            </xsl:for-each>
          </xsl:otherwise>
        </xsl:choose>
      </group>
    </xsl:for-each-group>
  </xsl:variable>

  <xsl:template name="cp">
    <xsl:param name="groups" as="element(group)*"/>
    <xsl:param name="row" as="item()*" select="()"/>
    <xsl:choose>
      <xsl:when test="not($groups)">
        <xsl:value-of select="$row" separator="{$sep}"/>
        <xsl:value-of select="$lf"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:for-each select="$groups[1]/value">
          <xsl:call-template name="cp">
            <xsl:with-param name="groups" select="$groups[position() gt 1]"/>
            <xsl:with-param name="row" select="$row, ."/>
          </xsl:call-template>
        </xsl:for-each>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="data">
    <xsl:value-of select="$groups/@name" separator="{$sep}"/>
    <xsl:value-of select="$lf"/>
    <xsl:call-template name="cp">
      <xsl:with-param name="groups" select="$groups"/>
    </xsl:call-template>
  </xsl:template>

</xsl:stylesheet>

Both Saxon 9 (with its .NET version) and AltovaXML tools (via COM interop) can be used with .NET applications, see http://www.saxonica.com/documentation/dotnet/dotnetapi.html for Saxon and http://manual.altova.com/AltovaXML/index.html?ax_netinterface.htm for Altova.

Martin Honnen