tags:

views:

296

answers:

1

I need some assistance converting an xml document to a CSV file using an xslt stylesheet. I am trying to use the following xsl and I can't seem to get it right. I want my comma delimited file to include column headings, followed by the data. My biggest issues are removing the final comma after the last item and inserting a carriage return so each group of data appears on a separate line. I have been using XML Notepad.

  <xsl:template match="/">
        <xsl:element name="table">
              <xsl:apply-templates select="/*/*[1]" mode="header" />
              <xsl:apply-templates select="/*/*" mode="row" />
        </xsl:element>
  </xsl:template>

  <xsl:template match="*" mode="header">
        <xsl:element name="tr">
              <xsl:apply-templates select="./*" mode="column" />
        </xsl:element>
  </xsl:template>

  <xsl:template match="*" mode="row">
        <xsl:element name="tr">
              <xsl:apply-templates select="./*" mode="node" />
        </xsl:element>
  </xsl:template>

  <xsl:template match="*" mode="column">
        <xsl:element name="th">
              <xsl:value-of select="translate(name(.),'qwertyuiopasdfghjklzxcvbnm_','QWERTYUIOPASDFGHJKLZXCVBNM ')" />
        </xsl:element>,
  </xsl:template>

  <xsl:template match="*" mode="node">
        <xsl:element name="td">
              <xsl:value-of select="." />
        </xsl:element>,
  </xsl:template> 

A: 

I use this to simple XSLT to convert XML to CSV; it assumes all child nodes of the root node are to be rows in the CSV, taking the element names of the first child of the root to be field names.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:for-each select="*/*[1]/*">
      <xsl:value-of select="name()" />
      <xsl:if test="not(position() = last())">,</xsl:if>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text>
    <xsl:apply-templates select="*/*" mode="row"/>
  </xsl:template>

  <xsl:template match="*" mode="row">
    <xsl:apply-templates select="*" mode="data" />
    <xsl:text>&#10;</xsl:text>
  </xsl:template>

  <xsl:template match="*" mode="data">
    <xsl:choose>
      <xsl:when test="contains(text(),',')">
        <xsl:text>&quot;</xsl:text>
        <xsl:call-template name="doublequotes">
          <xsl:with-param name="text" select="text()" />
        </xsl:call-template>
        <xsl:text>&quot;</xsl:text>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="." />
      </xsl:otherwise>
    </xsl:choose>
    <xsl:if test="position() != last()">,</xsl:if>
  </xsl:template>

  <xsl:template name="doublequotes">
    <xsl:param name="text" />
    <xsl:choose>
      <xsl:when test="contains($text,'&quot;')">
        <xsl:value-of select="concat(substring-before($text,'&quot;'),'&quot;&quot;')" />
        <xsl:call-template name="doublequotes">
          <xsl:with-param name="text" select="substring-after($text,'&quot;')" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

So this XML:

<csv>
  <row>
    <field1>foo</field1>
    <field2>ba"r</field2>
  </row>
  <row>
    <field1>foo,2</field1>
    <field2>bar,"2</field2>
  </row>
</csv>

Converts to:

field1,field2
foo,ba"r
"foo,2","bar,""2"

Not sure if this helps though, it depends how your XML is laid out.

Edit: Here's a more thorough transform:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:key name="field" match="/*/*/*" use="name()" />
  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:for-each select="*/*/*[generate-id() = generate-id(key('field',name())[1])]">
      <xsl:value-of select="name()" />
      <xsl:if test="position() != last()">,</xsl:if>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text>
    <xsl:apply-templates select="*/*" mode="row"/>
  </xsl:template>

  <xsl:template match="*" mode="row">
    <xsl:variable name="row" select="*" />
    <xsl:for-each select="/*/*/*[generate-id() = generate-id(key('field',name())[1])]">
      <xsl:variable name="name" select="name()" />
      <xsl:apply-templates select="$row[name()=$name]" mode="data" />
      <xsl:if test="position() != last()">,</xsl:if>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text>
  </xsl:template>

  <xsl:template match="*" mode="data">
    <xsl:choose>
      <xsl:when test="contains(text(),',')">
        <xsl:text>&quot;</xsl:text>
        <xsl:call-template name="doublequotes">
          <xsl:with-param name="text" select="text()" />
        </xsl:call-template>
        <xsl:text>&quot;</xsl:text>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="." />
      </xsl:otherwise>
    </xsl:choose>
    <xsl:if test="position() != last()">,</xsl:if>
  </xsl:template>

  <xsl:template name="doublequotes">
    <xsl:param name="text" />
    <xsl:choose>
      <xsl:when test="contains($text,'&quot;')">
        <xsl:value-of select="concat(substring-before($text,'&quot;'),'&quot;&quot;')" />
        <xsl:call-template name="doublequotes">
          <xsl:with-param name="text" select="substring-after($text,'&quot;')" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

This one will create a column in your CSV for all tag names that exist in all 'rows', and populate the appropriate column in each row.

Flynn1179
Perfect! Thanks so much for your help!
Brad H
@Flynn1179: You wrote: *it assumes all child nodes of the root node are to be rows in the CSV*. I think you're assuming more. Taking your example input document: you're assuming in your stylesheet that the fields in the first row are all the fields that can exist, and further that the fields of the individual rows are in the same order and that none are missing .
Alejandro
Of course, it's a fairly simple routine. I originally wrote it to convert a <table> tag in xhtml to CSV, which unless you've got colspan attributes works very well. XML's obviously a lot more flexible a format than CSV, so it's not going to convert well if it's not conforming to those assumptions. You can always verify that it is with an XSD first of course.
Flynn1179
Thinking about it, it wasn't too hard to adapt the transform to do this properly; I've edited in a more thorough XSLT that doesn't make these assumptions; it still assumes your XML is only two levels deep however (below the root), and that each element one level deep is intended as a row. One limitation it does have is where a 'row' has identical tag names, ironically like a <table> tag. My original transform actually took the contents of the <th> tags as the column names rather than the node names; my first transform above was a more generalized version.
Flynn1179