ansaurus

Question

Finding Directed Acyclic Graph (DAG) Minimal Elements (Vertices) with XSLT/XPath?

Answer 1

+2 A:

You can take advantage of XPath's implicit existential quantification on the = operator:

<xsl:for-each select="//vertex[not(@name = //vertex/directed-edge-to/@vertex)]">

When you use any of the six comparison operators (=, !=, <, <=, >, and >=) to compare a node-set, the expression will return true if any node in the node-set satisfies the condition. When comparing one node-set with another, the expression returns true if any node in the first node-set satisfies the condition when compared with any node in the second node-set. XPath 2.0 introduces six new operators that don't perform this existential quantification (eq, ne, lt, le, gt, and ge). But in your case, you'll want to use "=" to get that existential quantification.

Note of course, that you'll still want to use the not() function as you were doing. Most of the time, it's good to avoid the != operator. If you used it here instead of not(), then it would return true if there are any @vertex attributes that are not equal to the @name value, which is not your intention. (And if either node-set is empty, then it would return false, as comparisons with empty node-sets always return false.)

If you want to use eq instead, then you'd have to do something like you did: separate out the conditional from the iteration so you could bind current(). But in XPath 2.0, you can do this within an expression:

<xsl:for-each select="for $v in //vertex
                      return $v[not(//directed-edge-to[@vertex eq $v/@name])]">

This is useful for when your condition isn't a simple equality comparison (and thus can't be existentially quantified using "="). For example: starts-with(@vertex, $v/@name).

XPath 2.0 also has an explicit way of performing existential quantification. Instead of the for expression above, we could have written this:

<xsl:for-each select="//vertex[not(some $e in //directed-edge-to
                                   satisfies @name eq $e/@vertex)]">

In addition to the "some" syntax, XPath 2.0 also supplies a corresponding "every" syntax for performing universal quantification.

Rather than using for-each, you could also use template rules, which are more modular (and powerful):

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  <xsl:template match="/">
    <minimal-vertices>
      <xsl:apply-templates/>
    </minimal-vertices>
  </xsl:template>

  <!-- Copy vertex elements that have no arrows pointing to them -->
  <xsl:template match="vertex[not(@name = //directed-edge-to/@vertex)]">
    <minimal-vertex name="{@name}"/>
  </xsl:template>

</xsl:stylesheet>

Again, in this case, we're relying on the existential quantification of =.

XSLT 1.0 prohibits use of the current() function in patterns, i.e., in the match attribute, but XSLT 2.0 allows it. In that case, current() refers to the node currently being matched. So in XSLT 2.0, we could also write this (without having to use a for expression):

<xsl:template match="vertex[not(//directed-edge-to[@vertex eq current()/@name])]">

Note that this pattern is essentially the same as the expression you tried to use in for-each, but whereas it doesn't do what you want in for-each, it does do what you want in the pattern (because what current() binds to is different).

Finally, I'll add one more variation that in some ways simplifies the logic (removing not()). This also goes back to using XSLT 1.0:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  <xsl:template match="/">
    <minimal-vertices>
      <xsl:apply-templates/>
    </minimal-vertices>
  </xsl:template>

  <!-- By default, copy vertex elements -->
  <xsl:template match="vertex">
    <minimal-vertex name="{@name}"/>
  </xsl:template>

  <!-- But strip out vertices with incoming arrows -->
  <xsl:template match="vertex[@name = //directed-edge-to/@vertex]"/>

</xsl:stylesheet>

If you don't like the whitespace being output, add an empty rule for text nodes, so they'll get stripped out (overriding the default rule for text nodes, which is to copy them):

<xsl:template match="text()"/>

Or you could just be more selective in what nodes you apply templates to:

<xsl:apply-templates select="/dag/vertex"/>

Which approach you take is partially dependent on taste, partially dependent on the wider context of your stylesheet and expected data (how much the input structure might vary, etc.).

I know I went way beyond what you were asking for, but I hope you at least found this interesting. :-)

Evan Lenz 2009-05-10 10:38:00

Great Answer! Thanks for all of the variations and clear explanations. Hopefully this answer will help lots of people in the future! (this could have been broken into several answers)

Greg Mattes 2009-05-11 12:06:53

I'm glad you found it helpful. Thanks for the vote. I'm still learning how to use this website. Should I have provided separate responses?

Evan Lenz 2009-05-13 07:51:49

Providing separate answers or one answer with several variations is a matter of taste. Independent answers allow independent voting. For example, maybe I would have accepted an answer that uses apply-templates as the best response, but the community might have preferred an answer using for-each. Other alternatives could have been down-voted. My accepted answer would be shown first and the community answer second when sorting by votes. Comments could be addressed to particular solutions.

Greg Mattes 2009-05-13 19:42:40

Makes perfect sense. Thanks for the tips!

Evan Lenz 2009-05-15 06:04:42

Answer 2

+2 A:

One such XPath 1.0 expression is:

/*/vertex[not(@name = /*/vertex/directed-edge-to/@vertex)]

Then just put it into an XSLT stylesheet like that:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:template match="/">
      <minimal-vertices>
       <xsl:for-each select=
       "/*/vertex[not(@name = /*/vertex/directed-edge-to/@vertex)]"
       >
        <minimal-vertex name="{@name}"/>
       </xsl:for-each>
      </minimal-vertices>
    </xsl:template>
</xsl:stylesheet>

When this stylesheet is applied on the originally-provided XML document:

<dag>
    <vertex name="A">
     <directed-edge-to vertex="C"/>
    </vertex>
    <vertex name="B">
     <directed-edge-to vertex="C"/>
     <directed-edge-to vertex="D"/>
    </vertex>
    <vertex name="C">
     <directed-edge-to vertex="E"/>
    </vertex>
    <vertex name="D">
     <directed-edge-to vertex="E"/>
    </vertex>
    <vertex name="E">
     <directed-edge-to vertex="G"/>
    </vertex>
    <vertex name="F">
     <directed-edge-to vertex="G"/>
    </vertex>
    <vertex name="G"/>
</dag>

The wanted result is produced:

<minimal-vertices>
  <minimal-vertex name="A" />
  <minimal-vertex name="B" />
  <minimal-vertex name="F" />
</minimal-vertices>

Do note: A solution for traversing full (maybe cyclic) graphs is available in XSLT here.

Dimitre Novatchev 2009-05-10 13:51:25

Thanks! This is a great answer too, it's very focused on the question that I asked. It was a tough decision, but I accepted Evan's answer because of the breadth of his answer. I am curious about why you prefer the /*/ syntax to //, is there any advantage with the extra character?

Greg Mattes 2009-05-11 11:57:13

@greg-mattes THe "//" abbreviation should be avoided whenever possible as it is very expensive, causing the whole subtree rooted at the context node to be searched. "//" at the top level causes the whole XML document to be searched. It is very important *not* to use "//" whenever the structure of the XML document is known at the time of writing the XPath expression.

Dimitre Novatchev 2009-05-11 16:43:59

So /*/ is better in general because it limits the search to a single level because * means "selects all element children of the context node" (http://www.w3.org/TR/xpath#path-abbrev) rather than all descendants which could be a large search? In this particular example it shouldn't make a difference, but it's a good point to keep in mind. Thanks again.

Greg Mattes 2009-05-11 19:17:43

I agree with Dimitre about the use of "//". And you're right that performance isn't much of a consideration for this particular data. However, another reason to use /*/vertex, or, even better, /dag/vertex, is that it makes your intentions more explicit. "*" implies that the document element's name may vary, and "//" implies that <vertex> elements might appear as deeper descendants. You can save people reading your code from having to wonder such things by making your intentions more explicit. "//" is still useful, of course, when it's necessary, i.e. when it actually is your intention.

Evan Lenz 2009-05-13 07:57:44

ansaurus

tags:

views:

answers:

Finding Directed Acyclic Graph (DAG) Minimal Elements (Vertices) with XSLT/XPath?

related questions