tags:

views:

2194

answers:

5

There must be a generic way to transform some hierachical XML such as:

<element1 A="AValue" B="BValue">
   <element2 C="DValue" D="CValue">
      <element3 E="EValue1" F="FValue1"/>
      <element3 E="EValue2" F="FValue2"/>
   </element2>
   ...
</element1>

into the flattened XML (html) picking up selected attributes along the way and providing different labels for the attributes that become column headers.

<table>
   <tr>
     <th>A_Label</th>
     <th>D_Label</th>
     <th>E_Label</th>
     <th>F_Label</th>
   </tr>
   <tr>
     <td>AValue</td>
     <td>DValue</td>
     <td>EValue1</td>
     <td>FValue1</td>
   </tr>
   <tr>
     <td>AValue</td>
     <td>DValue</td>
     <td>EValue2</td>
     <td>FValue2</td>
   </tr>
<table>

OK, so there's not generic solution due to the attribute re-labelling but you get what I mean hopefully. I've just started on all the XSLT/XPATH stuff so I'll work it out in good time but any clues would be useful.

Thanks for your help. Steve

A: 

How about a perl script :)

Aaron
A: 

We already have a Pro*C program reading from an Oracle database, it calls a perl script which in turn executes some Java to extract data in XML format from the aforementioned database for calling a batch file to execute some vbscript FTPing the file to some other server. I was really hoping for something in Fortran.

whew...until I got to the fortran part I thought you were serious there for a second
davr
+1  A: 

I'm not 100% sure of what you are trying to do but this solution may work if your element1, element2 and element3 are nested consistently.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/">
     <table>
      <xsl:apply-templates select="//element3"></xsl:apply-templates>
     </table>
    </xsl:template>

    <xsl:template match="element3">
     <tr>
      <td><xsl:value-of select="../../@A"/></td>
      <td><xsl:value-of select="../../@B"/></td>
      <td><xsl:value-of select="../@C"/></td>
      <td><xsl:value-of select="../@D"/></td>
      <td><xsl:value-of select="@E"/></td>
      <td><xsl:value-of select="@F"/></td>
     </tr>
     <xsl:apply-templates select="*"></xsl:apply-templates>
    </xsl:template>

</xsl:stylesheet>
Darrel Miller
Thanks Darrel. Works superbly.
A: 

The original question needs to be clarified:

  • What happens with BValue and CValue in the original question? Is there a reason why they shouldn't be part of the flattened structure?
  • Do all of the elements in the XML doc have 2 attributes or is this completely arbitrary?
  • Are there only 3 types of elements and are they always nested as shown in the example?
  • Can your element1 be repeated itself or is this the root element of your doc?

In XSLT it is possible to write very generic transformers but it is often much easier to write a stylesheet to transform a document when you can take any known restrictions into account.

GerG
A: 

I have used an expanded version of the template below to flatten structured XML. Warning: There was some case-specific code in the original version (it actually turned the XML into CSV) that I just stripped and I didn't test this version.

The basic way it works should be clear: it prints everything that doesn't have node children and otherwise recursively calls the template on the node() that does have children. I don't think it handles attributes and comments correctly as it is now, but that should not be hard to fix.

<?xml version="1.0" encoding="UTF-8"?>

<!-- XSL template to flatten structured XML, before converting to CSV. -->
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

    <xsl:output method="xml" indent="yes" encoding="UTF-8"/>

    <xsl:strip-space elements="*" /> 

    <xsl:template match="/">
        <xsl:apply-templates select="//yourElementsToFlatten"/>
    </xsl:template>

    <xsl:template match="//yourElementsToFlatten">
        <xsl:apply-templates select="@*|node()"/>
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:choose>
            <!-- If the element has multiple childs, call this template 
                on its children to flatten it-->
            <xsl:when test="count(child::*) > 0">
                <xsl:apply-templates select="@*|node()"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:copy>
                    <xsl:value-of select="text()" />
                </xsl:copy>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>
Confusion