views:

43

answers:

3

Hi, I have a xml grouping challenge for which I need to group AND remove duplicate as below:

<Person>
<name>John</name>
<date>June12</date>
<workTime taskID=1>34</workTime>
<workTime taskID=1>35</workTime>
<workTime taskID=2>12</workTime>
</Person>
<Person>
<name>John</name>
<date>June13</date>
<workTime taskID=1>21</workTime>
<workTime taskID=2>11</workTime>
<workTime taskID=2>14</workTime>
</Person>

Note that for a specific occurence of name/taskID/date, only the first one is picked up. In this example,

<workTime taskID=1>35</workTime> 
<workTime taskID=2>14</workTime> 

would be left aside.

Below is the expected output:

<Person>
<name>John</name>
<taskID>1</taskID>
<workTime>
<date>June12</date>
<time>34</time>
</worTime>
<workTime>
<date>June13</date>
<time>21</time>
</worTime>
</Person>
<Person>
<name>John</name>
<taskID>2</taskID>
<workTime>
<date>June12</date>
<time>12</time>
</worTime>
<workTime>
<date>June13</date>
<time>11</time>
</worTime>
</Person>

I have tried to use a muenchian grouping in XSLT 1.0 using the key below:

<xsl:key name="PersonTasks" match="workTime" use="concat(@taskID, ../name)"/>

but then how do I only pick up the first occurence of

concat(@taskID, ../name, ../date)

? It seems that I need two level of keys!?

+2  A: 

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kwrkTimeByNameTask" match="workTime"
  use="concat(../name, '+', @taskID)"/>

 <xsl:key name="kDateByName" match="date"
  use="../name"/>

 <xsl:key name="kwrkTimeByNameTaskDate" match="workTime"
  use="concat(../name, '+', @taskID, '+', ../date)"/>

 <xsl:template match="/">
   <xsl:for-each select=
    "*/*/workTime
           [generate-id()
           =
            generate-id(key('kwrkTimeByNameTask',
                             concat(../name, '+', @taskID)
                            )[1]
                        )
           ]
    ">
      <xsl:sort select="../name"/>
      <xsl:sort select="@taskID" data-type="number"/>

      <xsl:variable name="vcurTaskId" select="@taskID"/>
      <Person>
        <name><xsl:value-of select="../name"/></name>
        <taskID><xsl:value-of select="@taskID"/></taskID>

          <xsl:for-each select=
           "key('kDateByName', ../name)
                  [key('kwrkTimeByNameTaskDate',
                       concat(../name, '+', current()/@taskID, '+', .)
                      )
                  ]
           ">
             <workTime>
               <date><xsl:value-of select="."/></date>
               <time>
                <xsl:value-of select=
                 "key('kwrkTimeByNameTaskDate',
                  concat(../name, '+', $vcurTaskId, '+', .)
                 )"/>
               </time>
             </workTime>
          </xsl:for-each>
      </Person>
   </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML (corrected from multiple issues to become well-formed):

<t>
    <Person>
        <name>John</name>
        <date>June12</date>
        <workTime taskID="1">34</workTime>
        <workTime taskID="1">35</workTime>
        <workTime taskID="2">12</workTime>
    </Person>
    <Person>
        <name>John</name>
        <date>June13</date>
        <workTime taskID="1">21</workTime>
        <workTime taskID="2">11</workTime>
        <workTime taskID="2">14</workTime>
    </Person>
</t>

produces the wanted, correct result:

<Person>
   <name>John</name>
   <taskID>1</taskID>
   <workTime>
      <date>June12</date>
      <time>34</time>
   </workTime>
   <workTime>
      <date>June13</date>
      <time>21</time>
   </workTime>
</Person>
<Person>
   <name>John</name>
   <taskID>2</taskID>
   <workTime>
      <date>June12</date>
      <time>12</time>
   </workTime>
   <workTime>
      <date>June13</date>
      <time>11</time>
   </workTime>
</Person>

Explanation:

  1. First we obtain all workTime elements with unique pairs of ../name, @taskID by using the Muenchian method for grouping.

  2. We sort these by ../name and @taskID -- in that order.

  3. For each such workTime we get all date elements that are listed with the ../name of this workTime and leave only those of these date elements, for which there is a workTime that has the same ../date and ../name.

  4. In the previous step we use two different auxiliary keys: 'kDateByName' indexes all date elements by their ../name, while 'kwrkTimeByNameTaskDate' indexes all workTime elements by their ../name, their ../date and their @taskID.

So, the meaning of the following:

          <xsl:for-each select=
           "key('kDateByName', ../name)
                  [key('kwrkTimeByNameTaskDate',
                       concat(../name, '+', current()/@taskID, '+', .)
                      )
                  ]
           ">

is:

For each date for that name, such that a workTime for that name, date and @taskID (of the current workTime for the outer <xsl:for-each>) exists, do whatever is in the body of this <xsl:for-each> instruction.

Dimitre Novatchev
can you explain a little bit the design of your solution.It looks short and nice but I'd like to learn as much as I can from it.Thanks
Daniel
@Daniel: I added an explanation.
Dimitre Novatchev
I was wondering if it's not better to use a simple Muenchian grouping and then check on the preceding-siblings for duplicate.Would it be a good solution?
Daniel
@Daniel: If we have the power of keys, then why revert back to siblings comparisons?
Dimitre Novatchev
A: 

Grouping in XSLT is usually done using a method called the Muenchian method. Find more data here: http://www.jenitennison.com/xslt/grouping/muenchian.html

Wilfred Springer
+1  A: 

Just for fun, another solutions with two keys. This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:key name="kWorkTimeByName-TaskID" match="workTime" 
              use="concat(../name,'++',@taskID)"/>
    <xsl:key name="kWorkTimeByName-Date-TaskID" match="workTime" 
              use="concat(../name,'++',../date,'++',@taskID)"/>
    <xsl:template match="/">
        <xsl:variable name="vAllWorkTime" select="*/*/workTime"/>
        <result>
            <xsl:for-each select="$vAllWorkTime
                        [count(.|key('kWorkTimeByName-TaskID',
                                         concat(../name,'++',@taskID))[1])=1]">
                <xsl:sort select="../name"/>
                <xsl:sort select="@taskID" data-type="number"/>
                <Person>
                    <xsl:copy-of select="../name"/>
                    <taskID>
                        <xsl:value-of select="@taskID"/>
                    </taskID>
                    <xsl:for-each select="$vAllWorkTime
                          [count(.|key('kWorkTimeByName-Date-TaskID',
                               concat(current()/../name,'++',
                                   ../date,'++',current()/@taskID))[1])=1]">
                        <xsl:sort select="../date"/>
                        <xsl:copy>
                            <xsl:copy-of select="../date"/>
                            <time>
                                <xsl:value-of select="."/>
                            </time>
                        </xsl:copy>
                    </xsl:for-each>
                </Person>
            </xsl:for-each>
        </result>
    </xsl:template>
</xsl:stylesheet>

Output:

<result>
    <Person>
        <name>John</name>
        <taskID>1</taskID>
        <workTime>
            <date>June12</date>
            <time>34</time>
        </workTime>
        <workTime>
            <date>June13</date>
            <time>21</time>
        </workTime>
    </Person>
    <Person>
        <name>John</name>
        <taskID>2</taskID>
        <workTime>
            <date>June12</date>
            <time>12</time>
        </workTime>
        <workTime>
            <date>June13</date>
            <time>11</time>
        </workTime>
    </Person>
</result>
Alejandro
I was wondering if it's not better to use a simple Muenchian grouping and then check on the preceding-siblings for duplicate. Would it be a good solution?
Daniel
+1 for using '++' and not '+' as I do. :)
Dimitre Novatchev
what is the difference between '++', '+' or none in the concat?
Daniel
@Daniel: About the separator string: it just have to be a string that can't be in either key, so take Dimitre comment mostly as a joke ;) About grouping: you are grouping by Name and Task, then you are grouping by Date (so the key became Name, Task and Date); it makes no diference for the algorith logic if you use all the nodes for the last current group or just the first one.
Alejandro