tags:

views:

840

answers:

3

OK, I want to apply a XSL style sheet that counts the previous unique "ROLE" nodes and spits out the following output format of @name the number of unique ROLE nodes prior to the current nodes. I've wasted several hours on what should be an easy thing to implement. I have tried to implement this in several ways including the Muenchian Method, if/with variables (Can't increment a variable), applying templates to templates etc to no avail.

I have the following XML:

<ROLEACTIONINFO>
  <ROLE name="TESTER" /> 
  <ROLE name="PARENT1"/>
  <ROLE name="PARENT1"/>
  <ROLE name="PARENT1"/>
  <ROLE name="PARENT2"/>
  <ROLE name="PARENT2"/>
  <ROLE name="PARENT3"/>
  <ROLE name="PARENT4"/>
  <ROLE name="TESTROLE"/>
</ROLEACTIONINFO>

OUTPUT EXAMPLE:

TESTER  1
PARENT1 2
PARENT1 2
PARENT1 2
PARENT2 3
PARENT2 3
PARENT3 4
PARENT4 5
TESTROLE  6

Getting the count of the unique preceeding nodes is my problem. Any help would be appreciated

A: 

Recursion usually works pretty well with problems like this.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output method="text" media-type="text/plain" />

    <xsl:template name="count-previous-but-not-with-my-name">
        <xsl:param name="nodes" />
        <xsl:param name="count" select="0" />
        <xsl:choose>
            <xsl:when test="count($nodes) = 0">
                <xsl:value-of select="$count" />
            </xsl:when>
            <xsl:otherwise>
                <xsl:variable name="last-name" select="$nodes[last()]/@name" />
                <xsl:variable name="nodes-before-me-without-my-name" select="$nodes[position() &lt; last() and @name != $last-name]" />
                <xsl:call-template name="count-previous-but-not-with-my-name">
                    <xsl:with-param name="nodes" select="$nodes-before-me-without-my-name" />
                    <xsl:with-param name="count" select="$count + 1" />
                </xsl:call-template>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template match="/">
        <xsl:for-each select="//ROLEACTIONINFO/ROLE">
            <xsl:variable name="role" select="current()" />
            <xsl:variable name="my-pos" select="position()" />
            <xsl:value-of select="current()/@name" /><xsl:text> </xsl:text>
            <xsl:call-template name="count-previous-but-not-with-my-name">
                <xsl:with-param name="nodes" select="$role/../ROLE[position() &lt;= $my-pos]" />
            </xsl:call-template>
            <xsl:text>&#10;</xsl:text>
        </xsl:for-each>
    </xsl:template>

</xsl:stylesheet>
Steef
+4  A: 

This can be solved pretty easily using XPath. Here's the expression you're looking for: count((.|preceding-sibling::ROLE)[not(@name = preceding-sibling::ROLE/@name)])

This can be broken down to make it more readable, as I've done in the following XSLT 1.0 stylesheet:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  <xsl:output method="text"/>

  <!-- don't copy whitespace -->
  <xsl:template match="text()"/>

  <xsl:template match="ROLE">
    <xsl:variable name="roles-so-far" select=". | preceding-sibling::ROLE"/>
    <!-- Only select the first instance of each ROLE name -->
    <xsl:variable name="roles-so-far-unique"
                  select="$roles-so-far[not(@name = preceding-sibling::ROLE/@name)]"/>
    <xsl:apply-templates select="@name"/>
    <xsl:text> </xsl:text>
    <xsl:value-of select="count($roles-so-far-unique)"/>
    <xsl:text>&#xA;</xsl:text> <!-- linefeed -->
  </xsl:template>

</xsl:stylesheet>

Here's an alternative implementation, using the Muenchian method. First, declare a key:

<xsl:key name="roles" match="ROLE" use="@name"/>

Then, replace the definition of $roles-so-far-unique with something like this:

<!-- Among all the ROLEs having one of the names so far,
     select only the first one for each name -->
<xsl:variable name="roles-so-far-unique"
              select="../ROLE[@name = $roles-so-far/@name]
                             [generate-id(.) = generate-id(key('roles',@name)[1])]"/>

This code, of course, is more complicated. Unless you have a large data set requiring you to speed up processing using the Muenchian method (even then I would test to make sure it buys you anything), you might as well stick with the simpler version above.

Finally, in XSLT 2.0, it's much easier. Simple replace the $roles-so-far-unique definition with the following:

<!-- Return a list of distinct string values, with duplicates removed -->
<xsl:variable name="roles-so-far-unique"
              select="distinct-values($roles-so-far/@name)"/>

I hope this has helped you identify where you went wrong in the various attempts that you mentioned.

Evan Lenz
+1 from me. The Muenchian XPath expression should be "$roles-so-far[generate-id(.) = generate-id(key('roles',@name)[1])]", though. What are you trying to do with "../ROLE[@name = $roles-so-far/@name]"?
Tomalak
The node-set to be filtered is only a subset of the nodes indexed by xsl:key. Your simplification works because of how $roles-so-far is defined. But if I change the definition of $roles-so-far (e.g., so that it lists every @name *after* rather than before), then it would be wrong. The first indexed node for a given value won't necessarily be in the subset. That said, it's just one local variable following another, so I think such coupling is fine. I might think differently if it was defined elsewhere (globally). I approve your simplification, but I'll leave it so these comments make sense.
Evan Lenz
It normally doesn't matter whether you implement the Muenchian Method using [1] or [last()]. The *intention* is to get one result for each value, not to get the first. I didn't like the [1] being suddenly and silently intentional; I wanted it to remain an implementation detail. :-)
Evan Lenz
I guess what I don't like about it (now that I understand your intent) is it's inherent potential to make the check more inefficient than needed. Imagine there are *many* following ROLE nodes all of which have a name that also is in $roles-so-far. This would mean the engine would do many useless checks against ROLE nodes that will never be selected because they already have been.
Tomalak
Yep, that's a valid critique, especially for large data sets.
Evan Lenz
+3  A: 

This is easily solved with an <xsl:key>:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>

  <xsl:output method="text" />

  <xsl:key name="kRole" match="ROLE" use="@name" />

  <xsl:template match="ROLE">
    <xsl:value-of select="concat(@name, ' ')" />
    <xsl:value-of select="count(
      (. | preceding-sibling::ROLE)[
        count(. | key('kRole', @name)[1]) = 1
      ])" />
  </xsl:template>

</xsl:stylesheet>

Output is as desired:

TESTER 1
PARENT1 2
PARENT1 2
PARENT1 2
PARENT2 3
PARENT2 3
PARENT3 4
PARENT4 5
TESTROLE 6

Explanation of the XPath expression in the <xsl:value-of>:

count(                          # count the nodes:
(. | preceding-sibling::ROLE)   # union of this node and its predecessors
[                               # where...
  count(                        # the count of the union of...
    . |                         #   this node and
    key('kRole', @name)[1]      #   the first node with the same @name
  ) = 1                         # is 1
]
)

This is the Muenchian method. Based on the fact that a node set cannot contain the same node twice, a union of two nodes has a node count of 1 if they are the same node. This way we are selecting the unique nodes from (. | preceding-sibling::ROLE) only.

If there are more than one <ROLEACTIONINFO> elements in your document, there is a parent check missing. This is also easily achieved:

  <xsl:template match="ROLE">
    <xsl:variable name="parentId" select="generate-id(..)" />
    <xsl:value-of select="count(
      (. | preceding-sibling::ROLE)[
        count(. | key('kRole', @name)[generate-id(..) = $parentId][1]) = 1
      ])" />
  </xsl:template>

Note that [generate-id(..) = $parentId][1] != [1][generate-id(..) = $parentId].

Order is important when chaining predicates. The former checks for parent node equality first and then takes the first unique node from the reduced set. This is what we want.

The latter takes the first node from the set (all ROLE nodes with a given name throughout the document), takes the first one and then keeps or discards it based on parent equality. This is wrong.

Tomalak
Nice answer, although you seem undecided about whether you prefer generate-id() or the count(.|$ns)=1 approach for determining node identity, using both in the same expression. :-) I personally prefer generate-id() since it's more descriptive, leaving count(.|$ns) to when it's indispensable (for getting the intersection of two node-sets), i.e. count(.|$ns)=count($ns)
Evan Lenz
I like both approaches, though usually I go with the generate-id() method for exactly the same reasons as you. I wanted to post an alternative to your solution, so I went with count() along with an explanation here. :-) For the "parent check" I chose generate-id() again because it is simply more expressive... Something like "count(. | key('kRole', @name)[count(.. | $parent) = 1][1]) = 1" is quite a mess.
Tomalak
Ok, so first let me say thanks to everyone for the explainations, I learned alot. However, I over simiplified my example and I am still a bit confused. I would like to post a folowup with a more detailed example, should this be posted as a new question (since the original question I asked was answered) or somehow attached this original question?
Jay
This depends on *how much* your real XML deviates from your example. If it is just a bit unclear to you, we can try to clarify here. If it is a lot more complex, a new question would be advisable.
Tomalak
I wen ahead and opened a new question because the complexity changed a few things. Here is the new question for those following: http://stackoverflow.com/questions/946302/xsl-counting-previous-unique-siblings-from-child-nodes
Jay