views:

289

answers:

2

I'm trying to use Muenchian grouping in my XSLT to group matching nodes, but I only want to group within a parent node, not across the entire source XML document.

Given XSLT and XML as follows (apologies for the length of my sample code):

XSLT

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"> 
 <xsl:output method="html" indent="yes"/>

 <xsl:key name="contacts-by-surname" match="contact" use="surname" />
 <xsl:template match="records">
  <xsl:for-each select="contact[count(. | key('contacts-by-surname', surname)[1]) = 1]">
   <xsl:sort select="surname" />
   <xsl:value-of select="surname" />,<br />
   <xsl:for-each select="key('contacts-by-surname', surname)">
    <xsl:sort select="forename" />
    <xsl:value-of select="forename" /> (<xsl:value-of select="title" />)<br />
   </xsl:for-each>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

XML

<root>
 <records>
  <contact id="0001">
   <title>Mr</title>
   <forename>John</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0002">
   <title>Dr</title>
   <forename>Amy</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0003">
   <title>Mrs</title>
   <forename>Mary</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0004">
   <title>Ms</title>
   <forename>Anne</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0005">
   <title>Mr</title>
   <forename>Peter</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0006">
   <title>Dr</title>
   <forename>Indy</forename>
   <surname>Jones</surname>
  </contact>
 </records>
 <records>
  <contact id="0001">
   <title>Mr</title>
   <forename>James</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0002">
   <title>Dr</title>
   <forename>Mandy</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0003">
   <title>Mrs</title>
   <forename>Elizabeth</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0004">
   <title>Ms</title>
   <forename>Sally</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0005">
   <title>Mr</title>
   <forename>George</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0006">
   <title>Dr</title>
   <forename>Harry</forename>
   <surname>Jones</surname>
  </contact>
 </records>
</root>

RESULT

Jones,
Amy (Dr)
Anne (Ms)
Harry (Dr)
Indy (Dr)
Mandy (Dr)
Sally (Ms)

Smith,
Elizabeth (Mrs)
George (Mr)
James (Mr)
John (Mr)
Mary (Mrs)
Peter (Mr)

How do I group within each <records> and achieve this result:

Jones,
Amy (Dr)
Anne (Ms)
Indy (Dr)

Smith,
John (Mr)
Mary (Mrs)
Peter (Mr)

Jones,
Harry (Dr)
Mandy (Dr)
Sally (Ms)

Smith,
Elizabeth (Mrs)
George (Mr)
James (Mr)
+3  A: 

Took me some time ... I was about to give up but continued nevertheless :)

The drawback of the key function is that the key generated will always be for the entire xml. Hence you should concatenate additional information in your key to make it more specific. In the e.g. below, I am concatenating the position of records node, so that I get keys for distinct surnames per records.

Here's the xslt:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  <xsl:output method="html" indent="yes"/>
  <xsl:key name="distinct-surname" match="contact" use="concat(generate-id(..), '|', surname)"/>
  <xsl:template match="records">
    <xsl:for-each select="contact[generate-id() = generate-id(key('distinct-surname', concat(generate-id(..), '|', surname))[1])]">
      <xsl:sort select="surname" />
      <xsl:value-of select="surname" />,<br />
      <xsl:for-each select="key('distinct-surname', concat(generate-id(..), '|', surname))">
        <xsl:sort select="forename" />
        <xsl:value-of select="forename" /> (<xsl:value-of select="title" />)<br />
      </xsl:for-each>
    </xsl:for-each>
  </xsl:template>  
</xsl:stylesheet>

This is the result:

Jones,
Amy (Dr)
Anne (Ms)
Indy (Dr)
Smith,
John (Mr)
Mary (Mrs)
Peter (Mr)
Jones,
Harry (Dr)
Mandy (Dr)
Sally (Ms)
Smith,
Elizabeth (Mrs)
George (Mr)
James (Mr)

Please note that the result is sorted on the forenames too. If you don't want to sort it on forenames, you need to remove the line <xsl:sort select="forename" />

Rashmi Pandit
Great answer and explanation. Thanks!
kristian
That's what I would have done, +1. I propose a tiny change: Instead of `concat(count(parent::*/preceding-sibling::*), surname)`, use `concat(generate-id(..), '|', surname)`. It's shorter, more efficient, and a bit safer because of the additional delimiter char.
Tomalak
Tomalak, I have edited the xslt as per your suggestion. Thanks :)
Rashmi Pandit
+3  A: 

There is simpler method, by adding a predicate which ensure than contacts involved in muench test are child of the current records.

<xsl:key name="contacts-by-surname" match="contact" use="surname" />
<xsl:template match="records">
  <xsl:for-each select="contact[count(. | key('contacts-by-surname', surname)[generate-id(parent::records) = generate-id(current())][1]) = 1]">
   <xsl:sort select="surname" />
   <xsl:value-of select="surname" />,<br />
   <xsl:for-each select="key('contacts-by-surname', surname)[generate-id(parent::records) = generate-id(current()/parent::records)]">
    <xsl:sort select="forename" />
    <xsl:value-of select="forename" /> (<xsl:value-of select="title" />)<br />
   </xsl:for-each>
  </xsl:for-each>
</xsl:template>
Erlock
It may be simpler, but it also is less efficient. I would say that `contact[generate-id() = generate-id(…[…])]` is O(n²) in the worst case, while @Rashmi Pandit's `contact[generate-id() = generate-id(…)]` is O(n).
Tomalak
Maybe less efficient, but more robust I think. Concatening strings into compound keys implies that the separator string never occurs in any used string. I prefer deterministic behaviour over fastest run. :)
Erlock
Hm… I can think of a way that id-value is ambiguous (id `"key-30"`, value `"0"` vs. id `"key-300"`, value `""`), but for id-separator-value (it would be `"id-30|0"` vs. `"id-300|"`)? The presence of the separator in the value is not relevant, IMHO. Am I missing something?
Tomalak
Erlock
Okay, agreed. The up-vote has been mine all the time. ;-)
Tomalak