views:

191

answers:

5

I'm processing an HTML page with a variable number of p elements with a css class "myclass", using Python + Selenium RC.

When I try to select each node with this xpath:

//p[@class='myclass'][n]

(with n a natural number)

I get only the first p element with this css class for every n, unlike the situation if I iterate through selecting ALL p elements with:

//p[n]

Is there any way I can iterate through elements by css class using xpath?

+1  A: 

XPath 1.0 doesn't provide an iterating construct.

Iteration can be performed on the selected node-set in the language that is hosting XPath.

Examples:

In XSLT 1.0:

   <xsl:for-each select="someExpressionSelectingNodes">
     <!-- Do something with the current node -->
   </xsl:for-each>

In C#:

using System;
using System.IO;
using System.Xml;

public class Sample {

  public static void Main() {

    XmlDocument doc = new XmlDocument();
    doc.Load("booksort.xml");

    XmlNodeList nodeList;
    XmlNode root = doc.DocumentElement;

    nodeList=root.SelectNodes("descendant::book[author/last-name='Austen']");

    //Change the price on the books.
    foreach (XmlNode book in nodeList)
    {
      book.LastChild.InnerText="15.95";
    }

    Console.WriteLine("Display the modified XML document....");
    doc.Save(Console.Out);

  }
}

XPath 2.0 has its own iteration construct:

   for $varname1 in someExpression1,
       $varname2 in someExpression2, 
      .  .  .  .  .  .  .  .  .  .  .
       $varnameN in someExpressionN 
    return
        SomeExpressionUsingTheVarsAbove
Dimitre Novatchev
perhaps my question wasn't clear enough, but I can't see how your answer is related to it.I can use the [n] ending to select an element from multiple simple matches e.g. //p[n] to iterate through ALL the p elements. my problem starts when trying to iterate through just those p elements which have a certain class.
Gj
Whoever downvoted this answer, please come up and state the reasons? was it due to bad weather or to the fact that you are an incompetent coward? I guess it was the latter...
Dimitre Novatchev
@Gj: Why, just substitute someExpressionSelectingNodes from my answer with your expression (`//p[@class='myclass']`) that selects the nodes you wan to iterate through. I have provided two examples how the iteration is organized -- in two different hosting languages. It has to be something similar in the hosting language you are using.
Dimitre Novatchev
A: 

Maybe all your divs with this class are at the same level, so by //p[@class='myclass'] you receive the array of paragraphs with the specified class. So you should iterate through it using indexes, i.e. //p[@class='myclass'][1], //p[@class='myclass'][2],...,//p[@class='myclass'][last()]

ZloiAdun
A: 

I don't think you're using the "index" for it's real purpose. The //p[selection][index] syntax in this selection is actually telling you which element within its parent it should be... So //p[selection][1] is saying that your selected p must be the first child of its parent. //p[selection][2] is saying it must be the 2nd child. Depending on your html, it's likely this isn't what you want.

Given that you're using Selenium and Python, there's a couple ways to do what you want, and you can look at this question to see them (there are two options given there, one in selenium Javascript, the other using the server-side selenium calls).

Ryley
@Ryley: Under XPath, the `[n]` predicate (which is short for `[position() = n]`) means "select only the `n`th node of the context group". The context group is the set of nodes specified by the XPath expression preceding the predicate. This may or may not relate to its order among siblings of a particular parent. In this case it doesn't.
LarsH
@LarsH - yeah, you've got me... I was not able to explain that well at all. Do you agree that the linked SO answers suggest the right type of answer (also very similar to what Dimitre says)... if not, I'll probably just remove this answer.
Ryley
@Ryley - I'm not sure whether the linked answers are relevant. Actually, I seem to recall from my limited and long-ago Selenium experience that Selenium doesn't do real XPath but a limited subset and even then maybe not entirely correct. So that might have been the OP's problem. For all I know `[n]` in Selenium works the way you said instead of the way the XPath spec says. Like I said in my comment on the question, if we saw the context where @Gj is iterating, we might be able to solve the problem.
LarsH
A: 

Here's a C# code snippet that might help you out.

The key here is the Selenium function GetXpathCount(). It should return the number of occurrences of the Xpath expression you are looking for.

You can enter //p[@class='myclass'] in XPather or any other Xpath analysis tool so you can indeed verify multiple results are returned. Then you just iterate through the results in your code.

In my case, it was all the list items in an UL that needed to be iterated -i.e. //li[@class='myclass']/ul/li - so based on your requirements should be something like:

int numProductsInLeftNav = Convert.ToInt32(selenium.GetXpathCount("//p[@class='myclass']"));

List<string> productsInLeftNav = new List<string>();
for (int i = 1; i <= numProductsInLogOutLeftNav; i++) {
    string productName = selenium.GetText("//p[@class='myclass'][" + i + "]");
    productsInLogoutLeftNav.Add(productName);
}
Hector M
A: 

Now that I look again at this question, I think the real problem is not in iterating, but in using //.

This is a FAQ:

//p[@class='myclass'][1] 

selects every p element that has a class attribute with value "myclass" and that is the first such child of its parent. Therefore this expression may select many p elements, none of which is really the first such p element in the document.

When we want to get the first p element in the document that satisfies the above predicate, one correct expression is:

(//p)[@class='myclass'][1] 

Remember: The [] operator has a higher priority (precedence) than the // abbreviation. WHanever you need to index the nodes selected by //, always put the expression to be indexed in brackets.

Here is a demonstration:

<nums>
 <a>
  <n x="1"/>
  <n x="2"/>
  <n x="3"/>
  <n x="4"/>
 </a>
 <b>
  <n x="5"/>
  <n x="6"/>
  <n x="7"/>
  <n x="8"/>
 </b>
</nums>

The XPath expression:

//n[@x mod 2 = 0][1]

selects the following two nodes:

<n x="2" />
<n x="6" />

The XPath expression:

(//n)[@x mod 2 = 0][1]

selects exactly the first n element in the document with the wanted property:

<n x="2" />

Try this first with the following transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="//n[@x mod 2 = 0][1]"/>
 </xsl:template>
</xsl:stylesheet>

and the result is two nodes.

<n x="2" />
<n x="6" />

Now, change the XPath expression as below and try again:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="(//n)[@x mod 2 = 0][1]"/>
 </xsl:template>
</xsl:stylesheet>

and the result is what we really wanted -- the first such n element in the document:

<n x="2" />
Dimitre Novatchev