views:

384

answers:

7

After playing around with position() in vain I was googling around for a solution and arrived at this older stackoverflow question which almost describes my problem.

The difference is that the nodeset I want the position within is dynamic, rather than a contiguous section of the document.

To illustrate I'll modify the example from the linked question to match my requirements. Note that each <b> element is within a different <a> element. This is the critical bit.

<root>
    <a>
        <b>zyx</b>
    </a>
    <a>
        <b>wvu</b>
    </a>
    <a>
        <b>tsr</b>
    </a>
    <a>
        <b>qpo</b>
    </a>
</root>

Now if I queried, using XPath for a/b I'd get a nodeset of the four <b> nodes. I want to then find the position within that nodeset of the node that contains the string 'tsr'. The solution in the other post breaks down here: count(a/b[.='tsr']/preceding-sibling::*)+1 returns 1 because preceding-sibling is navigating the document rather than the context node-set.

Is it possible to work within the context nodeset?

A: 

The reason you are getting 1 is nothing to do with context vs. document, but because you are only counting b nodes within the one a node (so you will always get a count of 0 because there are never any preceding 'b' nodes.

Rather you need to find the count of preceding 'a' nodes before the 'b' that contains your 'a'.

Something like:

count(a[b[.='tsr']]/preceding-sibling::a)
Richard
Well, in this case it's the count of 'a's - but that's not really what I want to count. I want to count the number of 'b's in my context nodeset. If you imagine that some of the 'a' node contained more than one 'b' node this could lead to a different count.
Phil Nash
@Tomalak - closer - but that doesn't get any additional <b> nodes that might be within the current <a> (but before the matching one)
Phil Nash
@Phil: Just add a `/b` to include the `b` children of the preceding `a`s inside the count (at the end). The key here is you need to start with the `a` parent of your `b`, not the `b` directly..
Richard
@Richard - thanks. Still trying to grok that (my mind has moved on to other things. I think I see what you're saying but still not sure how to piece it together)
Phil Nash
A: 

How about this..

count(a/b[.='tsr']/preceding-sibling::b) + count(a[b[.='tsr']]/preceding-sibling::a/b) + 1

Count the previous siblings of the b element within the current a element, and then count the b elements of all previous siblings of the a element. Or something like that.

Tim C
A: 

Hi.

I do not think is possible to get what you want with a single xpath. If you are using code i would select all the [//b] nodes and then iterate the list node to search the element I want.

If you are using XSL you could use the folowing:

<?xml version="1.0"?> 
<xsl:stylesheet  
 version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns="http://www.w3.org/1999/xhtml"&gt; 

 <xsl:output method="text"/> 

 <xsl:template match="/"> 
    <xsl:call-template name="findPos">
        <xsl:with-param name="list" select="//b"/>
        <xsl:with-param name="find">tsr</xsl:with-param>
    </xsl:call-template>
 </xsl:template> 

<xsl:template name="findPos">
    <xsl:param name="list"/>
    <xsl:param name="pos">1</xsl:param>
    <xsl:param name="find"/>

    <xsl:choose>
        <xsl:when test="count ($list) &lt; $pos">
            <xsl:text>Not Found</xsl:text>
        </xsl:when>
        <xsl:when test="$find = $list [position () = $pos]">
            <xsl:value-of select="$pos"/>
        </xsl:when>
        <xsl:otherwise>
            <xsl:call-template name="findPos">
                <xsl:with-param name="list" select="$list"/>
                <xsl:with-param name="pos" select="$pos + 1"/>
                <xsl:with-param name="find" select="$find"/>
            </xsl:call-template>
        </xsl:otherwise>
    </xsl:choose>

</xsl:template>

</xsl:stylesheet> 

You simply call the template with the string you want to find.

Jose Conde
A: 

from (i.e. against) the root: count(//a/b[.='tsr']/preceding::b)

If you had say another node eg <c> <b>qqq</b> </c>

and wanted to ignore all b elems not having an "a" parent you could do something like

count(//a/b[.='tsr']/preceding::b[local-name(parent::node())='a']) etc

dingo99
+1  A: 

I think I have a working solution

The idea is to count how many elements are preceding our target element in the document and count how many nodes in the nodeset there are that have less or equally many preceding elements. In XPath this is:

count(//a/b[count(./preceding::node()) &lt;= count(//a/b[.='tsr']/preceding::node())])

You can also use variables in this expression to find different nodesets or to match different text contents. Important part here is that the variables have correct type. Below is an XSLT example and an example output using the example document of the question as the input file

XSLT document

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

    <xsl:output encoding="utf-8" method="text"/>

    <xsl:variable name="nodeset" select="//a/b"/>
    <xsl:variable name="path-string">//a/b</xsl:variable>
    <xsl:variable name="text">tsr</xsl:variable>

    <xsl:template match="/">
        <xsl:text>Find and print position of a node within a nodeset&#10;&#10;</xsl:text>

        <xsl:text>Position of "tsr" node in the nodeset = "</xsl:text>
        <xsl:value-of select="count(//a/b[count(./preceding::node()) &lt;= count(//a/b[.='tsr']/preceding::node()) ])"/>
        <xsl:text>"&#10;&#10;</xsl:text>

        <xsl:text>( Try the same using variables "$nodeset" and "$text" )&#10;</xsl:text>
        <xsl:text>Size of nodeset "$nodeset" = "</xsl:text>
        <xsl:value-of select="count($nodeset)"/>
        <xsl:text>"&#10;</xsl:text>
        <xsl:text>Variable "$text" = "</xsl:text>
        <xsl:value-of select="$text"/>
        <xsl:text>"&#10;</xsl:text>
        <xsl:text>Position of "</xsl:text>
        <xsl:value-of select="$text"/>
        <xsl:text>" node in the nodeset = "</xsl:text>
        <xsl:value-of select="count($nodeset[count(./preceding::node()) &lt;= count($nodeset[.=$text]/preceding::node()) ])"/>
        <xsl:text>"&#10;&#10;</xsl:text>

        <xsl:text>( Show that using a variable that has the path as a string does not work )&#10;</xsl:text>
        <xsl:text>Variable "$path-string" = "</xsl:text>
        <xsl:value-of select="$path-string"/>
        <xsl:text>"&#10;</xsl:text>
        <xsl:text>Result of "count($path-string)" = "</xsl:text>
        <xsl:value-of select="count($path-string)"/>
        <xsl:text>"&#10;&#10;</xsl:text>

        <xsl:text>End of tests&#10;</xsl:text>
    </xsl:template>

</xsl:stylesheet>

Output from the example document

Find and print position of a node within a nodeset

Position of "tsr" node in the nodeset = "3"

( Try the same using variables "$nodeset" and "$text" )
Size of nodeset "$nodeset" = "4"
Variable "$text" = "tsr"
Position of "tsr" node in the nodeset = "3"

( Show that using a variable that has the path as a string does not work )
Variable "$path-string" = "//a/b"
Result of "count($path-string)" = "1"

End of tests

I have not tested my solution extensively so please give feedback if you use it.

jasso
+1 This is a clever solution. It may become slow for large nodesets, O(m * n) where m is the nodes in the working set and n is the nodes in the document. However it should work for any XPath expressions. You do have to specify the XPath for the working nodeset twice, but you do it in the exact same way so it isn't very error-prone. The only case where this approach might not work is if you're selecting nodes that aren't elements and don't have intervening elements: e.g. selecting text nodes among comment nodes. But if you changed `preceding::*` to `preceding::node()` you'd be in good shape.
LarsH
I meant to say, this answer is notable because it uses XPath only. However if you can use XSLT or some other surrounding tools, my answer is probably more efficient. :-)
LarsH
@LarsH Thanks, I was aiming for XPath only solution. Generally I also think that looping the set through in XSLT template would be better solution when possible.
jasso
I edited the solution to match your suggestion. Initially I didn't use `node()` because I started to hesitate about the results if the selection contained also attributes or namespace nodes. I have understood that those don't really have an order in XML document and the position could vary between parser/XSLT processor implementations. For example would it be a parser error to always read in the XML as canonical which would rearrange the attributes by name instead of the one written in the document. After all wouldn't the document model still remain the same?
jasso
@jasso It's true that attributes and namespace nodes (on the same element) don't have a defined document order in XSLT. (I don't know much about canonical XML and I believe it's a separate issue.) However this doesn't affect our question unless you're trying to find the position of an attribute or namespace node among a nodeset that includes other attribute/namespace nodes on the same element; and in that case, the answer is undefined, so no implementation can correctly compute it.
LarsH
+1  A: 

The earlier count-the-preceding(-sibling) answers work well in some cases; you're just re-specifying the context nodeset from the perspective of the item selected, and then applying count(preceding:: ) to it.

But in other cases, count-the-preceding is really hard to keep within the nodeset you want to work with, as you were hinting at. E.g. suppose your working nodeset was /html/body/div[3]//a (all the <a> anchors in the third <div> of the web page), and you wanted to find the position of a[@href="foo.html"] within that set. If you tried to use count(preceding::a), you'd accidentally be counting <a> anchors from other divs, i.e. outside your working nodeset. And if you tried count(preceding-sibling::a), you wouldn't get them all because the relevant <a> elements could be at any level.

You could try to restrict the count using preceding::a[ancestor::div[count(preceding-sibling::div) = 2]] but it gets really awkward fast, and still wouldn't be possible in all cases. Moreover you'd have to rework this expression if you ever updated the XPath expression for your working set, and keeping them equivalent would be non-trivial.

However if you're using XSLT, the following avoids these problems. If you can specify the working nodeset, you can find the position of a node within it matching supplied criteria. And you don't have to specify the nodeset twice:

    <xsl:for-each select="/root/a/b">
        <xsl:if test=". = 'tsr'"><xsl:value-of select="position()"/></xsl:if>
    </xsl:for-each>

This works because within the for-each, the context position "identifies the position of the context item in the sequence being processed."

If you aren't working in XSLT, what environment are you in? There is probably a similar construct there for iterating through the result of the outer XPath expression, and there you can maintain your own counter (if there's not a context position available), and test each item against your inner criteria.

The reason why the other guy's attempt on the older question, a/b[.='tsr']/position(), didn't work was because at each slash, a new context is pushed on the stack, so when position() is called, the context position is always 1. (This syntax only works in XPath 2.0 by the way.)

LarsH
@LarsH: Good explanation (+1). Maybe my answer will be interesting to you?
Dimitre Novatchev
@Dimitre: I look forward to digesting it. And learning the *right* answer. :-)
LarsH
+1  A: 

Here is a general solution that works on any node that belongs in any node-set of nodes in the same document:

I am using XSLT to implement the solution, but finally obtain a single XPath expression that may be used with any other hosting language.

Let $vNodeSet be the node-set and $vNode be the node in this node-set whose position we want to find.

Then, let $vPrecNodes contains all nodes in the XML document preceding $vNode.

Then, let $vAncNodes contains all nodes in the XML document that are ancestors of $vNode.

The set of nodes in $vNodeSet that precede $vNode in document order consists of all nodes in the nodeset that belong also to $vPrecNodes and all nodes in the node-set that also belong to $vAncNodes.

I will use the well-known Kaysian formula for intersection of two nodesets:

$ns1[count(.|$ns2) = count($ns2)]

contains exactly the nodes in the intersection of $ns1 with $ns2.

Based on all this, let $vPrecInNodeSet is the set of nodes in $vNodeSet that precede $vNode in document order. The following XPath expression defines $vPrecInNodeSet:

$vNodeSet
      [count(.|$vPrecNodes) = count($vPrecNodes)
      or
       count(.|$vAncNodes) = count($vAncNodes)
      ]

Finally, the wanted position is: count($vPrecInNodeSet) +1

Here's how this all works together:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

 <xsl:variable name="vNodeSet" select="/*/a/b"/>

 <xsl:variable name="vNode" select="$vNodeSet[. = 'tsr'][1]"/>

 <xsl:variable name="vPrecNodes" select="$vNode/preceding::node()"/>

 <xsl:variable name="vAncNodes" select="$vNode/ancestor::node()"/>

 <xsl:variable name="vPrecInNodeSet" select=
  "$vNodeSet
      [count(.|$vPrecNodes) = count($vPrecNodes)
      or
       count(.|$vAncNodes) = count($vAncNodes)
      ]
  "/>

 <xsl:template match="/">
   <xsl:value-of select="count($vPrecInNodeSet) +1"/>
 </xsl:template>
</xsl:stylesheet>

When the above transformation is applied on the provided XML document:

<root>
    <a>
        <b>zyx</b>
    </a>
    <a>
        <b>wvu</b>
    </a>
    <a>
        <b>tsr</b>
    </a>
    <a>
        <b>qpo</b>
    </a>
</root>

the correct result is produced:

3

Do note: This solution does not depend on XSLT (used only for illustrative purposes). You may assemble a single XPath expression, substituting the variables with their definition, until there are no more variables to substitute.

Dimitre Novatchev
Great answer, as expected. These are techniques I'd seen before but was not fluent enough in to come up with it when needed. Just for fun I tried expanding the variables to see what the XPath-only equivalent looked like: `count(/*/a/b[count(. | /*/a/b[. = 'tsr'][1]/preceding::node()) = count(/*/a/b[. = 'tsr'][1]/preceding::node()) or count(. | /*/a/b[. = 'tsr'][1]/ancestor::node()) = count(/*/a/b[. = 'tsr'][1]/ancestor::node()) ]) + 1` (about 212 characters). One might need to put parens around the value of $vNodeSet for safety, e.g. if its last axis is a reverse one.
LarsH