ansaurus

Question

How could I optimize the XPath test not(previous-sibling::sect1) for testing whether this is the first sect1 child element?

Answer 1

A:

I think you should just be able to do

  <xsl:if test="position() = 1">
    <!-- Special handling for first sect1 element goes here. -->
  </xsl:if>
  <!-- Common handling for all sect1 elements goes here. -->
  <xsl:if test="position() = last()">
    <!-- Special handling for last sect1 element goes here. -->
  </xsl:if>

since position() and last() are context-sensitive.

AakashM 2010-08-27 13:40:35

Doesn't this have quadratic complexity since position() is (probably?) an `O(n)` function?

Frerich Raabe 2010-08-27 13:58:52

I just realized: this is not equivalent: position() returns the position() among *all* children. I just want to know whether it's the first `sect1` element, not the first child element at all.

Frerich Raabe 2010-08-27 14:09:03

@AakashM, it depends on how it's used. I think in a match pattern or just inside a for-each loop, yes position() and last() are relevant to the context list. However, in other places, such as after "/" in an XPath expression, the context is different.

LarsH 2010-09-04 03:57:43

Answer 2

+1 A:

Assuming that the context the template is called in is a child-node selection then I offer the below. If the context they were called in was via a different axis (say preceding-sibling or ancestor) then the way to approach it is best.

Two possibilities are to simplify the tests, or to replace them with different templates:

Simpler tests:

<xsl:template match="sect1">
  <xsl:if test="position() = 1">
    <!-- Special handling for first sect1 element goes here. -->
  </xsl:if>
  <!-- Common handling for all sect1 elements goes here. -->
  <xsl:if test="position() = last()">
    <!-- Special handling for last sect1 element goes here. -->
  </xsl:if>
</xsl:template>

Different templates:

<xsl:template name="handleSect1">
  <!-- Common handling for all sect1 elements goes here. -->
<xsl:template>
<xsl:template match="sect1">
  <xsl:call-template name="handleSect1"/>
</xsl:template>
<xsl:template match="sect1[1]">
  <!-- Special handling for first sect1 element goes here. -->
  <xsl:call-template name="handleSect1"/>
</xsl:template>
<xsl:template match="sect1[last()]">
  <xsl:call-template name="handleSect1"/>
  <!-- Special handling for last sect1 element goes here. -->
</xsl:template>
<xsl:template match="sect1[position() = 1 and position() = last()]">
  <!-- Special handling for first sect1 element goes here. -->
  <xsl:call-template name="handleSect1"/>
  <!-- Special handling for last sect1 element goes here. -->
</xsl:template>

Since you say "optimise" I assume you care about which will process faster. It will vary according to xslt processor, processing modes (some have a "compile" option, and this will affect which is more efficient) and input XML. The fastest may be either of these or your original.

Really, every one of these should be as efficient as the other, the difference is in optimisations that the processor manages to make.

I would favour the first in my answer here in this case, as it's the most concise, but if I was to not have common handling shared between all 4 cases, I would favour the approach in the second answer, which then clearly marks different approaches for each case.

Jon Hanna 2010-08-27 13:52:03

Is the first suggestion really correct? Doesn't position() return 1 only if the `sect1` element is the first child element? I want the code to be executed for the first `sect1` element, even if it's not the first child element.

Frerich Raabe 2010-08-27 14:09:58

It returns the position in the given context. If the context is such that other nodes could proceed then the test could be "../sect1[1] = current()".

Jon Hanna 2010-08-27 14:46:35

@Jon Hanna: I think you really don't need the separated named template, you could just add @name to the rule matching "sect1". Also to rules matching first and last, to avoid repeated code in rule matching only one occurence (so, first and last). About your last comment: to test identity you sould use `generate-id(../sect1[1]) = generate-id(current())`.

Alejandro 2010-08-27 17:18:32

@Jon Hanna: And last, the first suggestions works not only under child axis template applying, but also needs an element test as in `apply-templates select="sect1"`

Alejandro 2010-08-27 17:27:59

Answer 3

+1 A:

Is it likely that the XSLT processor will stop assembling the preceding-sibling::sect1 node-set after the first found match because it knows that it just needs to find one or zero elements?

I don't know about xsltproc, but Saxon is very good at these sorts of optimizations. I believe it would only check for the first found match because it only needs to know whether the node-set is empty or not.

However you could always make sure by changing your tests as follows:

  <xsl:if test="not(preceding-sibling::sect1[1])">

and

  <xsl:if test="not(following-sibling::sect1[1])">

as this will only test for the first sibling along each axis. Note that the [1] in each case refers to the order of the XPath step, which is the order of the axis, not necessarily document order. So preceding-sibling::sect1[1] refers to the sect1 sibling immediately preceding the current element, not the first sect1 sibling in document order. Because the direction of the preceding-sibling axis is reverse.

LarsH 2010-09-04 03:56:48

@LarsH: Yes, specific XSLT processor optimization could catch this as a shortcut, but from http://www.w3.org/TR/xpath/#predicates `A predicate filters a node-set with respect to an axis to produce a new node-set` and from http://www.w3.org/TR/xpath/#axes `the following-sibling axis contains all the following siblings of the context node` It's not clear that this is an optimization in which you can trust. In this case, as well as other, pattern matching is the best approach.

Alejandro 2010-09-06 22:26:33

@Alejandro: I agree that one can't guarantee this optimization will occur without knowing what processor is being used. However the OP was asking about the likelihood of optimization and I believe there is high likelihood with Saxon. The pieces you quote from the spec specify the semantics, rather than the implementation. Even tho "the following-sibling axis contains all the following siblings of the context node", that does not mean a processor must instantiate or process all of them in order to be compliant.

LarsH 2010-09-07 10:43:17

ansaurus

tags:

views:

answers:

How could I optimize the XPath test not(previous-sibling::sect1) for testing whether this is the first sect1 child element?

related questions