tags:

views:

65

answers:

1

I must transform some XML data into a paginated list of fields. Here is an example.

Input XML:

<?xml version="1.0" encoding="UTF-8"?>
<data>
    <books>
        <book title="t0"/>
        <book title="t1"/>
        <book title="t2"/>
        <book title="t3"/>
        <book title="t4"/>
    </books>
    <library name="my library"/>
</data>

Desired output:

<?xml version="1.0" encoding="UTF-8"?>
<pages>
    <page number="1">
        <field name="library_name" value="my library"/>
        <field name="book_1" value="t0"/>
        <field name="book_2" value="t1"/>
    </page>
    <page number="2">
        <field name="book_1" value="t2"/>
        <field name="book_2" value="t3"/>
    </page>
    <page number="3">
        <field name="book_1" value="t4"/>
    </page>
</pages>

In the above example I assume that I want at most 2 fields named book_n (with n ranging between 1 and 2) per page. Tags <page> must have an attribute number. Finally, the field named library_name must appear only the first <page>.

Here is my current solution using XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="2.0"
    exclude-result-prefixes="trx xs">

    <xsl:output method="xml" indent="yes" omit-xml-declaration="no" />

    <xsl:variable name="max" select="2"/>

    <xsl:template match="//books">

        <xsl:for-each-group select="book" group-ending-with="*[position() mod $max = 0]">

            <xsl:variable name="pageNum" select="position()"/>

            <page number="{$pageNum}">

                <xsl:for-each select="current-group()">
                    <xsl:variable name="idx" select="if (position() mod $max = 0) then $max else position() mod $max"/>
                    <field value="{@title}">
                        <xsl:attribute name="name">book_<xsl:value-of select="$idx"/>
                        </xsl:attribute>
                    </field>                
                </xsl:for-each>

                <xsl:if test="$pageNum = 1">
                    <xsl:call-template name="templateFor_library"/>            
                </xsl:if>    

            </page>

        </xsl:for-each-group>    

    </xsl:template>

    <xsl:template name="templateFor_library">
        <xsl:for-each select="//library">
            <field name="library_name" value="{@name}" />
        </xsl:for-each>        
    </xsl:template>

</xsl:stylesheet> 

Is there a better/simpler way to perform this transformation?

+3  A: 

Yes there is.

<xsl:param name="pagesize" select="2" />

<xsl:template match="data">
  <pages>
    <xsl:apply-templates mode="page" select="
      books/book[position() mod $pagesize = 1]
    " />
  </pages>
</xsl:template>

<xsl:template match="book" mode="page">
  <page number="{position()}">
    <xsl:apply-templates select="
      . | following-sibling::book[position() &lt; $pagesize]
    " />
  </page>
</xsl:template>

<xsl:template match="book">
  <field name="book_{position()}" value="{@title}" />
</xsl:template>

EDIT #1: The above is XSLT 1.0 compliant. You can still modify it to use XSLT 2.0's <xsl:for-each-group>, if you want to. Personally, I find separate templates more appealing than one big fat nested for-each construct, YMMV.

EDIT #2: As per request in the comments. To make something appear for the first page only, modify one template:

<xsl:template match="book" mode="page">
  <page number="{position()}">
    <xsl:if test="position() = 1">
      <xsl:attribute name="library_name`>
        <xsl:value-of select="ancestor::data/library/@name" />
      </xsl:attribute>
    </xsl:if>
    <xsl:apply-templates select="
      . | following-sibling::book[position() &lt; $pagesize]
    " />
  </page>
</xsl:template>
Tomalak
Great! Thank you! And how would you ensure that the field named library_name appears only on the first <page> element?
MarcoS
@MarcoS: See modified solution. I would strongly dis-recommend that, though. The library name is logical part of the `<pages>` element and should go *there*, instead of "on the first `<page>` element", which does not make a whole lot of sense.
Tomalak
@Tomalak: I agree that `libray_name` is logically part of `<pages>`, but current requirement is to put it on the first `<page>`. Thank you.
MarcoS
@MarcoS: Well, have fun with that, then. ;)
Tomalak