tags:

views:

333

answers:

3

For each "agency" node I need to find the "stmt" elements that have the same key1, key2, key3 values and output just one "stmt" node with the "comm" and "prem" values summed together. For any "stmt" elements within that "agency" that don't match any other "stmt" elements based on key1, key2 and key3 I need to output them as is. So after transformation the first "agency" node would only have two "stmt" nodes (one summed) and the second "agency" node would be passed as is because the keys don't match. XSLT 1.0 or 2.0 solutions are ok...though my stylesheet is currently 1.0. Note that the agency nodes could have any number of "stmt" elements that have matching keys which need to be grouped and summed and any number that don't.

<statement>
<agency>
    <stmt>
        <key1>1234</key1>
        <key2>ABC</key2>
        <key3>15.000</key3>
        <comm>75.00</comm>
        <prem>100.00</prem>
    </stmt>
    <stmt>
        <key1>1234</key1>
        <key2>ABC</key2>
        <key3>15.000</key3>
        <comm>25.00</comm>
        <prem>200.00</prem>
    </stmt>
    <stmt>
        <key1>1234</key1>
        <key2>ABC</key2>
        <key3>17.50</key3>
        <comm>25.00</comm>
        <prem>100.00</prem>
    </stmt>
</agency>
<agency>
    <stmt>
        <key1>5678</key1>
        <key2>DEF</key2>
        <key3>15.000</key3>
        <comm>10.00</comm>
        <prem>20.00</prem>
    </stmt>
    <stmt>
        <key1>5678</key1>
        <key2>DEF</key2>
        <key3>17.000</key3>
        <comm>15.00</comm>
        <prem>12.00</prem>
    </stmt>
</agency>

+1  A: 

In XSLT 1.0 use the Muenchian method for grouping (with compound key).

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kStmtByKeys" match="stmt"
      use="concat(generate-id(..), key1, '+', key2, '+', key3)"/>

 <xsl:template match="node()|@*">
   <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
 </xsl:template>

 <xsl:template match="agency">
   <agency>
    <xsl:for-each select=
     "stmt[generate-id()
          =
           generate-id(key('kStmtByKeys',
                           concat(generate-id(..), key1, '+', key2, '+', key3)
                           )[1]
                       )
           ]
     ">
      <xsl:variable name="vkeyGroup" select=
       "key('kStmtByKeys', concat(generate-id(..), key1, '+', key2, '+', key3))"/>

     <stmt>
      <xsl:copy-of select="*[starts-with(name(), 'key')]"/>
      <comm>
       <xsl:value-of select="sum($vkeyGroup/comm)"/>
      </comm>
      <prem>
       <xsl:value-of select="sum($vkeyGroup/prem)"/>
      </prem>
     </stmt>
    </xsl:for-each>
   </agency>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document, produces the wanted result:

<statement>
    <agency>
        <stmt>
            <key1>1234</key1>
            <key2>ABC</key2>
            <key3>15.000</key3>
            <comm>100</comm>
            <prem>300</prem>
        </stmt>
        <stmt>
            <key1>1234</key1>
            <key2>ABC</key2>
            <key3>17.50</key3>
            <comm>25</comm>
            <prem>100</prem>
        </stmt>
    </agency>
    <agency>
        <stmt>
            <key1>5678</key1>
            <key2>DEF</key2>
            <key3>15.000</key3>
            <comm>10</comm>
            <prem>20</prem>
        </stmt>
        <stmt>
            <key1>5678</key1>
            <key2>DEF</key2>
            <key3>17.000</key3>
            <comm>15</comm>
            <prem>12</prem>
        </stmt>
    </agency>
</statement>
Dimitre Novatchev
If I understood the question correctly, you solution is broken when another agency has `stmt` nodes with the same keys. To me it seems that since there are multiple agencies the muenchian method with the global key isn't going to work.
Lucero
@Lucero: A good observation, thanks. This is now corrected and I am still using the Muenchian method with a compond key.
Dimitre Novatchev
Hm, is this way of generating a compound key guaranteed to give the wanted results in all situations? if a key was `concat('1', '23')` and another was `concat('12', '3')` (you get the idea) this may produce problems depending on the input document and the XSLT processor.
Lucero
Thank you both for the detailed answers and the pitfalls. Concatination would work for my data. I'll look over these options closer to determine the optimum for my current and future data.
johkar
@Lucero: Do you notice the "breaking" `'+'` in the arguments to `concat()`? *This* is what guarantees avoiding any conflicts -- of course we must be sure to select a string that is never going to be ending and/or beginning of the true data values that are concatenated to form the key.
Dimitre Novatchev
@Dimitre, yes, I saw it, but I didn't have the exact spec of the generate-id() function output at hand, which is also why worte this as a question ("in all situations?"). But you're right, the "+" character is not allowed as being part of a generated ID, which makes it a suitable separator here. http://www.w3.org/TR/xslt#function-generate-id
Lucero
A: 
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/|*">
        <xsl:copy>
            <xsl:apply-templates select="*" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="stmt">
        <xsl:variable name="stmtGroup" select="../stmt[(key1=current()/key1) and (key2=current()/key2) and (key3=current()/key3)]" />
        <xsl:if test="generate-id()=generate-id($stmtGroup[1])">
            <xsl:copy>
                <key1>
                    <xsl:value-of select="key1"/>
                </key1>
                <key2>
                    <xsl:value-of select="key2"/>
                </key2>
                <key3>
                    <xsl:value-of select="key3"/>
                </key3>
                <comm>
                    <xsl:value-of select="format-number(sum($stmtGroup/comm), '#.00')"/>
                </comm>
                <prem>
                    <xsl:value-of select="format-number(sum($stmtGroup/prem), '#.00')"/>
                </prem>
            </xsl:copy>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>
Lucero
+1  A: 

And an XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 exclude-result-prefixes="xs"
 >
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
   <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
 </xsl:template>

 <xsl:template match="agency">
  <agency>
   <xsl:for-each-group select="stmt" group-by=
    "concat(key1, '+', key2, '+', key3)">

    <stmt>
      <xsl:copy-of select=
       "current-group()[1]/*[starts-with(name(),'key')]"/>

       <comm>
         <xsl:value-of select="sum(current-group()/comm)"/>
       </comm>
       <prem>
         <xsl:value-of select="sum(current-group()/prem)"/>
       </prem>
    </stmt>
   </xsl:for-each-group>
  </agency>
 </xsl:template>
</xsl:stylesheet>
Dimitre Novatchev
The `concat(key1,key2,key3)` is gonna fail in certain cases, for instance `key1="1A" key2="B" key3="1.000"` and `key1="1" key2="AB" key3="1.000"`... I feel that concatenating strings without intimate knowledge of their contents (or restriction thereof) is wrong.
Lucero
@Lucero: Thanks again, there is nothing wrong with the concat -- it was something slipping from me -- I'm feeling so sleepy the whole day today -- which is now corrected. Please, do let me know if the correction satisfies you. This correction is something typical in such kind of solutions.
Dimitre Novatchev
@Dimitre, in contrast to the other `concat` issue the `+` isn't a suitable separator here, since the XML data may theoretically very well have key strings with `+` in them - think of `key1="1+" key2="2"` and `key1="1" key2="+2"`. So my saying is that you should only concat when you know that the separator will never be part of the concatenated data.
Lucero
@Lucero: While this is in principle true, people who use this method are well aware of the possible problem. It is only them who know the value space of their data and they usually can choose in a well-informed manner. All solutions at xslt-related forums use the `"|"` as the breaking string, although people know that in some cases this might not be a good choice. Anyway, thanks for your insisteent reminder, although this isn't something new.
Dimitre Novatchev
@Lucero: (Cont.): I am using `"+"` consistently, because I believe it (symbolically) expresses the nature of the concatenation operation. If this issue is really that important to you, why don't you use whatever you consider a really rare string? Something like: `'!+|@#$%^*()`'. Anyway, thanks for your insistent reminder, although this isn't something new.
Dimitre Novatchev
@Dimitre, you wrote "people who use this method are well aware of the possible problem". On a site like SO where the person asking the question as well as persons searching the site not know the technique, I feel that it is important to make the readers aware of any limitations or things to keep in mind when using a specific solution. I was only trying to point this out.
Lucero