ansaurus

Question

XSLT 1.0: grouping and removing duplicate

Answer 1

+2 A:

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kwrkTimeByNameTask" match="workTime"
  use="concat(../name, '+', @taskID)"/>

 <xsl:key name="kDateByName" match="date"
  use="../name"/>

 <xsl:key name="kwrkTimeByNameTaskDate" match="workTime"
  use="concat(../name, '+', @taskID, '+', ../date)"/>

 <xsl:template match="/">
   <xsl:for-each select=
    "*/*/workTime
           [generate-id()
           =
            generate-id(key('kwrkTimeByNameTask',
                             concat(../name, '+', @taskID)
                            )[1]
                        )
           ]
    ">
      <xsl:sort select="../name"/>
      <xsl:sort select="@taskID" data-type="number"/>

      <xsl:variable name="vcurTaskId" select="@taskID"/>
      <Person>
        <name><xsl:value-of select="../name"/></name>
        <taskID><xsl:value-of select="@taskID"/></taskID>

          <xsl:for-each select=
           "key('kDateByName', ../name)
                  [key('kwrkTimeByNameTaskDate',
                       concat(../name, '+', current()/@taskID, '+', .)
                      )
                  ]
           ">
             <workTime>
               <date><xsl:value-of select="."/></date>
               <time>
                <xsl:value-of select=
                 "key('kwrkTimeByNameTaskDate',
                  concat(../name, '+', $vcurTaskId, '+', .)
                 )"/>
               </time>
             </workTime>
          </xsl:for-each>
      </Person>
   </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML (corrected from multiple issues to become well-formed):

<t>
    <Person>
        <name>John</name>
        <date>June12</date>
        <workTime taskID="1">34</workTime>
        <workTime taskID="1">35</workTime>
        <workTime taskID="2">12</workTime>
    </Person>
    <Person>
        <name>John</name>
        <date>June13</date>
        <workTime taskID="1">21</workTime>
        <workTime taskID="2">11</workTime>
        <workTime taskID="2">14</workTime>
    </Person>
</t>

produces the wanted, correct result:

<Person>
   <name>John</name>
   <taskID>1</taskID>
   <workTime>
      <date>June12</date>
      <time>34</time>
   </workTime>
   <workTime>
      <date>June13</date>
      <time>21</time>
   </workTime>
</Person>
<Person>
   <name>John</name>
   <taskID>2</taskID>
   <workTime>
      <date>June12</date>
      <time>12</time>
   </workTime>
   <workTime>
      <date>June13</date>
      <time>11</time>
   </workTime>
</Person>

Explanation:

First we obtain all workTime elements with unique pairs of ../name, @taskID by using the Muenchian method for grouping.
We sort these by ../name and @taskID -- in that order.
For each such workTime we get all date elements that are listed with the ../name of this workTime and leave only those of these date elements, for which there is a workTime that has the same ../date and ../name.
In the previous step we use two different auxiliary keys: 'kDateByName' indexes all date elements by their ../name, while 'kwrkTimeByNameTaskDate' indexes all workTime elements by their ../name, their ../date and their @taskID.

So, the meaning of the following:

          <xsl:for-each select=
           "key('kDateByName', ../name)
                  [key('kwrkTimeByNameTaskDate',
                       concat(../name, '+', current()/@taskID, '+', .)
                      )
                  ]
           ">

is:

For each date for that name, such that a workTime for that name, date and @taskID (of the current workTime for the outer <xsl:for-each>) exists, do whatever is in the body of this <xsl:for-each> instruction.

Dimitre Novatchev 2010-08-19 05:56:02

can you explain a little bit the design of your solution.It looks short and nice but I'd like to learn as much as I can from it.Thanks

Daniel 2010-08-19 11:10:43

@Daniel: I added an explanation.

Dimitre Novatchev 2010-08-19 13:01:36

I was wondering if it's not better to use a simple Muenchian grouping and then check on the preceding-siblings for duplicate.Would it be a good solution?

Daniel 2010-08-19 17:53:48

@Daniel: If we have the power of keys, then why revert back to siblings comparisons?

Dimitre Novatchev 2010-08-19 18:12:19

Answer 2

A:

Grouping in XSLT is usually done using a method called the Muenchian method. Find more data here: http://www.jenitennison.com/xslt/grouping/muenchian.html

Wilfred Springer 2010-08-19 08:47:16

Answer 3

+1 A:

Just for fun, another solutions with two keys. This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:key name="kWorkTimeByName-TaskID" match="workTime" 
              use="concat(../name,'++',@taskID)"/>
    <xsl:key name="kWorkTimeByName-Date-TaskID" match="workTime" 
              use="concat(../name,'++',../date,'++',@taskID)"/>
    <xsl:template match="/">
        <xsl:variable name="vAllWorkTime" select="*/*/workTime"/>
        <result>
            <xsl:for-each select="$vAllWorkTime
                        [count(.|key('kWorkTimeByName-TaskID',
                                         concat(../name,'++',@taskID))[1])=1]">
                <xsl:sort select="../name"/>
                <xsl:sort select="@taskID" data-type="number"/>
                <Person>
                    <xsl:copy-of select="../name"/>
                    <taskID>
                        <xsl:value-of select="@taskID"/>
                    </taskID>
                    <xsl:for-each select="$vAllWorkTime
                          [count(.|key('kWorkTimeByName-Date-TaskID',
                               concat(current()/../name,'++',
                                   ../date,'++',current()/@taskID))[1])=1]">
                        <xsl:sort select="../date"/>
                        <xsl:copy>
                            <xsl:copy-of select="../date"/>
                            <time>
                                <xsl:value-of select="."/>
                            </time>
                        </xsl:copy>
                    </xsl:for-each>
                </Person>
            </xsl:for-each>
        </result>
    </xsl:template>
</xsl:stylesheet>

Output:

<result>
    <Person>
        <name>John</name>
        <taskID>1</taskID>
        <workTime>
            <date>June12</date>
            <time>34</time>
        </workTime>
        <workTime>
            <date>June13</date>
            <time>21</time>
        </workTime>
    </Person>
    <Person>
        <name>John</name>
        <taskID>2</taskID>
        <workTime>
            <date>June12</date>
            <time>12</time>
        </workTime>
        <workTime>
            <date>June13</date>
            <time>11</time>
        </workTime>
    </Person>
</result>

Alejandro 2010-08-19 16:17:00

I was wondering if it's not better to use a simple Muenchian grouping and then check on the preceding-siblings for duplicate. Would it be a good solution?

Daniel 2010-08-19 18:08:03

+1 for using '++' and not '+' as I do. :)

Dimitre Novatchev 2010-08-19 18:14:06

what is the difference between '++', '+' or none in the concat?

Daniel 2010-08-19 18:23:25

@Daniel: About the separator string: it just have to be a string that can't be in either key, so take Dimitre comment mostly as a joke ;) About grouping: you are grouping by Name and Task, then you are grouping by Date (so the key became Name, Task and Date); it makes no diference for the algorith logic if you use all the nodes for the last current group or just the first one.

Alejandro 2010-08-19 18:44:05

ansaurus

tags:

views:

answers:

XSLT 1.0: grouping and removing duplicate

related questions