tags:

views:

197

answers:

3

Let's say I have the following XML structure:

<entry>
  <countries>USA, Australia, Canada</countries>  
</entry>
<entry>
  <countries>USA, Australia</countries>
</entry>
<entry>
  <countries>Australia, Belgium</countries>
</entry>
<entry>
  <countries>Croatia</countries>
</entry>

I would like to count number of instances for each country appears in these entries. I can only use client side XSLT (no custom server code allowed). The end results needs to look like this:

Country    | Count
-----------|--------
Australia  |     3
USA        |     2
Belgium    |     1
Canada     |     1
Croatia    |     1

As Mike pointed-out this XML structure could be improved, however it is produced by 3rd party system and I cannot change it.

Is it possible to achieve this XSLT and if so how?

A: 

Is there a reason you aren't using the format:

<entry>
  <countries>
    <country>USA</country>
    <country>Australia</country>
    <country>Canada</country>
  </countries>  
</entry>

Your current way doesn't really match how XML data should be stored.

As you've said you can't change the data format try a combination of tokenize() and count() (provided you have XSLT2 support, otherwise I think you're out of luck).

Mike McQuaid
I know, and I agree, but this output is produced by 3rd party system and I cannot change that.
Toni Frankola
+2  A: 

In XSLT 1.0, your best bet is to use a two-step approach.

  1. tokenize the input from comma-delimited into separate elements
  2. group on the separate elements

Step #1 tokenizes the input:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>

  <xsl:template match="/root">
    <countries>
      <xsl:apply-templates select="entry" />
    </countries>
  </xsl:template>

  <xsl:template match="entry">
    <xsl:call-template name="tokenize">
      <xsl:with-param name="input" select="countries" />
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="tokenize">
    <xsl:param name="input" />

    <xsl:variable name="list" select="concat($input, ',')" />
    <xsl:variable name="head" select="substring-before($list, ',') " />
    <xsl:variable name="tail" select="substring-after($list, ',') " />

    <xsl:if test="normalize-space($head) != ''">
      <country>
        <xsl:value-of select="normalize-space($head)" />
      </country>
      <xsl:call-template name="tokenize">
        <xsl:with-param name="input" select="$tail" />
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

produces:

<countries>
  <country>USA</country>
  <country>Australia</country>
  <country>Canada</country>
  <country>USA</country>
  <country>Australia</country>
  <country>Australia</country>
  <country>Belgium</country>
  <country>Croatia</country>
</countries>

Step #2 applies Muenchian grouping to the intermediary result:

<xsl:stylesheet 
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:output method="text" />

  <xsl:key name="kCountry" match="country" use="." />

  <xsl:template match="/countries">
    <xsl:apply-templates select="country">
      <xsl:sort select="count(key('kCountry', .))" data-type="number" order="descending" />
      <xsl:sort select="." data-type="text" order="ascending" />
    </xsl:apply-templates>
  </xsl:template>

  <xsl:template match="country">
    <xsl:if test="generate-id() = generate-id(key('kCountry', .)[1])">
      <xsl:value-of select="." />
      <xsl:text>&#9;</xsl:text>
      <xsl:value-of select="count(key('kCountry', .))" />
      <xsl:text>&#10;</xsl:text>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

produces the wanted result (formatting is left as an exercise for the reader):

Australia  3
USA        2
Belgium    1
Canada     1
Croatia    1

The process can be done in a single transformation, with the help of the node-set() extension function. However, you would lose the ability to use an XSL key, which might result in slower performance for large inputs. YMMV.

The necessary modification of step #1 would be (using the MSXSL extensions, other vendors differ in the namespace declaration, which reduces portability of this approach):

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:msxsl="urn:schemas-microsoft-com:xslt"
  exclude-result-prefixes="msxsl"
>

  <xsl:template match="/root">
    <!-- store the list of <country>s as a result-tree-fragment -->
    <xsl:variable name="countries">
      <xsl:apply-templates select="entry" />
    </xsl:variable>
    <!-- convert the result-tree-fragment to a usable node-set -->
    <xsl:variable name="country" select="msxsl:node-set($countries)/country" />

    <!-- iteration, sorting and grouping in one step -->
    <xsl:for-each select="$country">
      <xsl:sort select="count($country[. = current()])" data-type="number" order="descending" />
      <xsl:sort select="." data-type="text" order="ascending" />
      <xsl:if test="generate-id() = generate-id($country[. = current()][1])">
        <xsl:value-of select="." />
        <xsl:text>&#9;</xsl:text>
        <xsl:value-of select="count($country[. = current()])" />
        <xsl:text>&#10;</xsl:text>
      </xsl:if>
    </xsl:for-each>
  </xsl:template>

  <!-- ... the remainder of the stylesheet #1 is unchanged ... -->

</xsl:stylesheet>

With this approach, a separate step #2 becomes unnecessary. The result is the same as above. For small inputs, the difference in performance will not be noticeable.

Tomalak
A: 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:str="http://exslt.org/strings" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="xsl str msxsl" version="1.0">
  <xsl:import href="str.split.template.xsl"/>
  <xsl:output indent="yes"/>

  <xsl:template match="/">
    <xsl:variable name="countries">
      <xsl:call-template name="get-counties" />
    </xsl:variable>

    <table>
    <xsl:for-each select="msxsl:node-set($countries)/country[not(. = preceding::country)]">
      <xsl:variable name="name" select="./text()"/>
      <tr>
        <td>
          <xsl:value-of select="$name" />
        </td>
        <td>
          <xsl:value-of select="count(msxsl:node-set($countries)/country[. = $name])" />
        </td>
      </tr>
    </xsl:for-each>
    </table>
  </xsl:template>

  <xsl:template name="get-counties">
    <xsl:for-each select="//countries">
      <xsl:variable name="countries-raw">
        <xsl:call-template name="str:split">
          <xsl:with-param name="string" select="text()"/>
          <xsl:with-param name="pattern" select="','" />
        </xsl:call-template>
      </xsl:variable>

      <xsl:for-each select="msxsl:node-set($countries-raw)/token">
        <country>
          <xsl:value-of select="normalize-space(.)"/>
        </country>
      </xsl:for-each>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

str.split.template.xsl is a part of str module of EXSLT (http://www.exslt.org/download.html).

Lloyd
I think sorting the output was part of the task.
Tomalak