views:

750

answers:

2

As similar to this question (there are more related entries, however as a new user I can only post one URL): http://stackoverflow.com/questions/1489326/xpath-get-elements-that-are-between-2-elements

I have a question regarding the selection of set of elements that occur between 'other / delimiting' elements. This situation occurs when trying to transform a flat HTML table to a hierarchic XML structure using XSLT. I tried using recursion in the templates, but saxon refused to accept this as it resulted in a dead-lock, most probably my fault, but let's start at the beginning.

First the source data is the HTML table:

<table >
    <thead>
     <tr>
      <th>Column 1</th>
      <th>Column 2</th>
      <th>Column 3</th>
     </tr>
    </thead>
    <tbody>
     <tr>
      <th colspan="3" >Group 1</th>
     </tr>
     <tr>
      <td>attribute 1.1.1</td>
      <td>attribute 1.1.3</td>
      <td>attribute 1.1.2</td>
     </tr>
     <tr>
      <td>attribute 1.2.1</td>
      <td>attribute 1.2.2</td>
      <td>attribute 1.2.3</td>
     </tr>
     <tr>
      <td>attribute 1.3.1</td>
      <td>attribute 1.3.2</td>
      <td>attribute 1.3.3</td>
     </tr>
     <tr>
      <th colspan="3" >Group 2</th>
     </tr>
     <tr>
      <td>attribute 2.1.1</td>
      <td>attribute 2.1.3</td>
      <td>attribute 2.1.2</td>
     </tr>
     <tr>
      <td>attribute 2.2.1</td>
      <td>attribute 2.2.2</td>
      <td>attribute 2.2.3</td>
     </tr>
     <tr>
      <td>attribute 2.3.1</td>
      <td>attribute 2.3.2</td>
      <td>attribute 2.3.3</td>
     </tr>
    </tbody>
</table>

The targeted output in XML would be:

 <groups>
    <group name="Group 1">
     <item attribute1="attribute 1.1.1" attribute2="attribute 1.1.3" attribute3="attribute 1.1.2"/>
     <item attribute1="attribute 1.2.1" attribute2="attribute 1.2.2" attribute3="attribute 1.2.3"/>
     <item attribute1="attribute 1.3.1" attribute2="attribute 1.3.2" attribute3="attribute 1.3.3"/>
    </group>
    <group name="Group 2">
     <item attribute1="attribute 2.1.1" attribute2="attribute 2.1.3" attribute3="attribute 2.1.2"/>
     <item attribute1="attribute 2.2.1" attribute2="attribute 2.2.2" attribute3="attribute 2.2.3"/>
     <item attribute1="attribute 2.3.1" attribute2="attribute 2.3.2" attribute3="attribute 2.3.3"/>
    </group>
</groups>

So I want to have all the item entries, (TR elements) and add them to a group. This basically comes down to select all following-sibling TR elements until we encounter one that has a TH element as a child. If I could only determine the position of this first TR that has a TH child, indicating a new heading for a group, this could be done with:

<xsl:for-each select="tbody/tr">
    <xsl:if test="th">
     <xsl:element name="group">
      <xsl:attribute name="name"><xsl:value-of select="th"/></xsl:attribute>
      <xsl:for-each select="following-sibling::tr[position() < $positionOfNextThElement]">   
       <xsl:call-template name="item"/>
      </xsl:for-each>
     </xsl:element>
    </xsl:if>
</xsl:for-each>

However, I am not able to determine the position of the first encountered TR/TH tag.

As stated I tried working with recursion in templates: always call the "item" template and in this template determine whether we want to invoke it on the next item as well. I think the problem is in the invocation of the template from within the template. The item in context does not increase? Should I hand over a parameter to determine what item we are working on?

Anyhow, this was what I came up with:

<xsl:for-each select="tbody/tr">
    <xsl:if test="th">
     <xsl:element name="group">
      <xsl:attribute name="name"><xsl:value-of select="th"/></xsl:attribute>
      <xsl:call-template name="item"/>
     </xsl:element>
    </xsl:if>
</xsl:for-each>

<xsl:template name="item">
    <xsl:element name="item">
     <xsl:attribute name="attribute1"><xsl:value-of select="following-sibling::tr[1]/td[1]"/></xsl:attribute>
     <xsl:attribute name="attribute2"><xsl:value-of select="following-sibling::tr[1]/td[2]"/></xsl:attribute>
     <xsl:attribute name="attribute2"><xsl:value-of select="following-sibling::tr[1]/td[3]"/></xsl:attribute>
    </xsl:element>
    <!-- When the next element has not got a TH tag, continue with invoking this template -->
    <xsl:if test="count(following-sibling::tr[1]/th) != 1">
     <xsl:call-template name="item"/>
    </xsl:if>
</xsl:template>

Any suggestions on how to realize this are welcome!

+1  A: 

The reason that context does not increase when you call the template "item" recursively is that xs:call-template always passes the current context item as context. So as you probably saw, the transform just enters infinite recursion.

Assuming that you always need to produce exactly three attributes, you don't even need recursion.

Try this:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:template match="table">
        <groups>
            <xsl:apply-templates select="tbody/tr[th]"/>
        </groups>
    </xsl:template>

    <xsl:template match="tr[th]">
        <xsl:variable name="id" select="generate-id(.)"/>
        <group name="{string(th)}">
            <xsl:apply-templates
                select="following-sibling::tr[not(th)][generate-id(preceding-sibling::tr[th][1]) = $id]"/>
        </group>
    </xsl:template>

    <xsl:template match="tr">
        <item attribute1="{td[1]}" attribute2="{td[2]}" attribute3="{td[3]}" />                    
    </xsl:template>

</xsl:stylesheet>

This works by applying templates to each header row. Each of those template uses that complicated xpath to call "its" following rows, which are any following sibling rows that have that specific row as it's first preceding row with a header.

Of course, if the number of attributes vary, then you will need to recurse there and increase pass a parameter indicating the position.

There are a couple of established methods for XSLT grouping, one of which is recursive, like you were doing. Another method is called Muenchian grouping. A good write-up is here.

James Sulak
Thumbs up for your answer! This did the trick, fortunately the number of attributes is static. I'll check out the referenced documentation for future grouping issues.
Holtkamp
A: 

An alternative solution, fitted for variable attribute counts without recursion.

<xsl:stylesheet 
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>

  <xsl:template match="table">
    <groups>
      <xsl:apply-templates select="tbody/tr[th]"/>
    </groups>
  </xsl:template>

  <xsl:template match="tr[th]">
    <group name="{th}">
      <xsl:apply-templates select="
        following-sibling::tr[not(th)][
          generate-id(preceding-sibling::tr[th][1]) = generate-id(current())
        ]
      "/>
    </group>
  </xsl:template>

  <xsl:template match="tr">
    <item>
     <xsl:apply-templates select="td" />
    </item>
  </xsl:template>

  <xsl:template match="td">
    <xsl:attribute name="attribute{position()}">
      <xsl:value-of select="." />
    </xsl:attribute>
  </xsl:template>

</xsl:stylesheet>
Tomalak