ansaurus

Question

Answer 1

A:

Recursion usually works pretty well with problems like this.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output method="text" media-type="text/plain" />

    <xsl:template name="count-previous-but-not-with-my-name">
        <xsl:param name="nodes" />
        <xsl:param name="count" select="0" />
        <xsl:choose>
            <xsl:when test="count($nodes) = 0">
                <xsl:value-of select="$count" />
            </xsl:when>
            <xsl:otherwise>
                <xsl:variable name="last-name" select="$nodes[last()]/@name" />
                <xsl:variable name="nodes-before-me-without-my-name" select="$nodes[position() &lt; last() and @name != $last-name]" />
                <xsl:call-template name="count-previous-but-not-with-my-name">
                    <xsl:with-param name="nodes" select="$nodes-before-me-without-my-name" />
                    <xsl:with-param name="count" select="$count + 1" />
                </xsl:call-template>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template match="/">
        <xsl:for-each select="//ROLEACTIONINFO/ROLE">
            <xsl:variable name="role" select="current()" />
            <xsl:variable name="my-pos" select="position()" />
            <xsl:value-of select="current()/@name" /><xsl:text> </xsl:text>
            <xsl:call-template name="count-previous-but-not-with-my-name">
                <xsl:with-param name="nodes" select="$role/../ROLE[position() &lt;= $my-pos]" />
            </xsl:call-template>
            <xsl:text>&#10;</xsl:text>
        </xsl:for-each>
    </xsl:template>

</xsl:stylesheet>

Steef 2009-06-02 21:44:18

Answer 2

+4 A:

This can be solved pretty easily using XPath. Here's the expression you're looking for: count((.|preceding-sibling::ROLE)[not(@name = preceding-sibling::ROLE/@name)])

This can be broken down to make it more readable, as I've done in the following XSLT 1.0 stylesheet:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  <xsl:output method="text"/>

  <!-- don't copy whitespace -->
  <xsl:template match="text()"/>

  <xsl:template match="ROLE">
    <xsl:variable name="roles-so-far" select=". | preceding-sibling::ROLE"/>
    <!-- Only select the first instance of each ROLE name -->
    <xsl:variable name="roles-so-far-unique"
                  select="$roles-so-far[not(@name = preceding-sibling::ROLE/@name)]"/>
    <xsl:apply-templates select="@name"/>
    <xsl:text> </xsl:text>
    <xsl:value-of select="count($roles-so-far-unique)"/>
    <xsl:text>&#xA;</xsl:text> <!-- linefeed -->
  </xsl:template>

</xsl:stylesheet>

Here's an alternative implementation, using the Muenchian method. First, declare a key:

<xsl:key name="roles" match="ROLE" use="@name"/>

Then, replace the definition of $roles-so-far-unique with something like this:

<!-- Among all the ROLEs having one of the names so far,
     select only the first one for each name -->
<xsl:variable name="roles-so-far-unique"
              select="../ROLE[@name = $roles-so-far/@name]
                             [generate-id(.) = generate-id(key('roles',@name)[1])]"/>

This code, of course, is more complicated. Unless you have a large data set requiring you to speed up processing using the Muenchian method (even then I would test to make sure it buys you anything), you might as well stick with the simpler version above.

Finally, in XSLT 2.0, it's much easier. Simple replace the $roles-so-far-unique definition with the following:

<!-- Return a list of distinct string values, with duplicates removed -->
<xsl:variable name="roles-so-far-unique"
              select="distinct-values($roles-so-far/@name)"/>

I hope this has helped you identify where you went wrong in the various attempts that you mentioned.

Evan Lenz 2009-06-03 07:44:02

+1 from me. The Muenchian XPath expression should be "$roles-so-far[generate-id(.) = generate-id(key('roles',@name)[1])]", though. What are you trying to do with "../ROLE[@name = $roles-so-far/@name]"?

Tomalak 2009-06-03 13:13:23

The node-set to be filtered is only a subset of the nodes indexed by xsl:key. Your simplification works because of how $roles-so-far is defined. But if I change the definition of $roles-so-far (e.g., so that it lists every @name *after* rather than before), then it would be wrong. The first indexed node for a given value won't necessarily be in the subset. That said, it's just one local variable following another, so I think such coupling is fine. I might think differently if it was defined elsewhere (globally). I approve your simplification, but I'll leave it so these comments make sense.

Evan Lenz 2009-06-03 15:41:55

It normally doesn't matter whether you implement the Muenchian Method using [1] or [last()]. The *intention* is to get one result for each value, not to get the first. I didn't like the [1] being suddenly and silently intentional; I wanted it to remain an implementation detail. :-)

Evan Lenz 2009-06-03 15:45:39

I guess what I don't like about it (now that I understand your intent) is it's inherent potential to make the check more inefficient than needed. Imagine there are *many* following ROLE nodes all of which have a name that also is in $roles-so-far. This would mean the engine would do many useless checks against ROLE nodes that will never be selected because they already have been.

Tomalak 2009-06-03 17:34:29

Yep, that's a valid critique, especially for large data sets.

Evan Lenz 2009-06-05 00:03:35

Answer 3

+3 A:

This is easily solved with an <xsl:key>:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>

  <xsl:output method="text" />

  <xsl:key name="kRole" match="ROLE" use="@name" />

  <xsl:template match="ROLE">
    <xsl:value-of select="concat(@name, ' ')" />
    <xsl:value-of select="count(
      (. | preceding-sibling::ROLE)[
        count(. | key('kRole', @name)[1]) = 1
      ])" />
  </xsl:template>

</xsl:stylesheet>

Output is as desired:

TESTER 1
PARENT1 2
PARENT1 2
PARENT1 2
PARENT2 3
PARENT2 3
PARENT3 4
PARENT4 5
TESTROLE 6

Explanation of the XPath expression in the <xsl:value-of>:

count(                          # count the nodes:
(. | preceding-sibling::ROLE)   # union of this node and its predecessors
[                               # where...
  count(                        # the count of the union of...
    . |                         #   this node and
    key('kRole', @name)[1]      #   the first node with the same @name
  ) = 1                         # is 1
]
)

This is the Muenchian method. Based on the fact that a node set cannot contain the same node twice, a union of two nodes has a node count of 1 if they are the same node. This way we are selecting the unique nodes from (. | preceding-sibling::ROLE) only.

If there are more than one <ROLEACTIONINFO> elements in your document, there is a parent check missing. This is also easily achieved:

  <xsl:template match="ROLE">
    <xsl:variable name="parentId" select="generate-id(..)" />
    <xsl:value-of select="count(
      (. | preceding-sibling::ROLE)[
        count(. | key('kRole', @name)[generate-id(..) = $parentId][1]) = 1
      ])" />
  </xsl:template>

Note that [generate-id(..) = $parentId][1] != [1][generate-id(..) = $parentId].

Order is important when chaining predicates. The former checks for parent node equality first and then takes the first unique node from the reduced set. This is what we want.

The latter takes the first node from the set (all ROLE nodes with a given name throughout the document), takes the first one and then keeps or discards it based on parent equality. This is wrong.

Tomalak 2009-06-03 12:47:17

Nice answer, although you seem undecided about whether you prefer generate-id() or the count(.|$ns)=1 approach for determining node identity, using both in the same expression. :-) I personally prefer generate-id() since it's more descriptive, leaving count(.|$ns) to when it's indispensable (for getting the intersection of two node-sets), i.e. count(.|$ns)=count($ns)

Evan Lenz 2009-06-03 16:22:12

I like both approaches, though usually I go with the generate-id() method for exactly the same reasons as you. I wanted to post an alternative to your solution, so I went with count() along with an explanation here. :-) For the "parent check" I chose generate-id() again because it is simply more expressive... Something like "count(. | key('kRole', @name)[count(.. | $parent) = 1][1]) = 1" is quite a mess.

Tomalak 2009-06-03 17:05:25

Ok, so first let me say thanks to everyone for the explainations, I learned alot. However, I over simiplified my example and I am still a bit confused. I would like to post a folowup with a more detailed example, should this be posted as a new question (since the original question I asked was answered) or somehow attached this original question?

Jay 2009-06-03 17:17:51

This depends on *how much* your real XML deviates from your example. If it is just a bit unclear to you, we can try to clarify here. If it is a lot more complex, a new question would be advisable.

Tomalak 2009-06-03 17:26:01

I wen ahead and opened a new question because the complexity changed a few things. Here is the new question for those following: http://stackoverflow.com/questions/946302/xsl-counting-previous-unique-siblings-from-child-nodes

Jay 2009-06-03 18:14:50

ansaurus

tags:

views:

answers:

XSL: Counting Previous Unique Siblings

related questions