views:

104

answers:

2

A html page has paging links, 1 set at the top of the page and another on the bottom of the page.

Using HtmlUnit, I am currently getting the HtmlAnchor on the page using getByAnchorText("1");

There is a problem in some of the links on the top, so I want to reference the bottom links using XPath.

nextPageAnchor = (HtmlAnchor) page.getByXPath("");

How can I reference the 2nd link on the page, with using xpath?

I need to reference the link using the AnchorText, so a link like:

<a href="....">33</a>

The href has random text, and is a javascript function so I have no idea what it will be.

Is this possible with xpath?

+3  A: 

It's pretty simple:

 (//a)[2]

the //a gets all anchors on the page and the [2] gets the second one (it's one-indexed not zero-indexed, so 2, is actually the 2nd, not the 3rd as you would expect with an array, for example)

If you want to get a link with the text of 33 then you can use:

 //a[./text() = "33"]

See http://www.w3.org/TR/xpath/ for the full xpath definition.

EDIT

To address Alexandre's comment, you could use

 (//a[./text() = "33"])[2]

This will first select all <a> tags with a text of 33, and then it will select the second of those.

EDIT 2

NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.

Markusk is indeed correct. The quote above is from the xPath definition referenced above.

Jonathan Fingland
Perhaps we want the *second link* with *the given text* but the question isn't too clear
Alexandre Jasmin
Thanks, I actually just returned the array, and then IF it has 2 elements, got the 2nd one. A little safer that way but thanks for the tip!
Blankman
happy to help .
Jonathan Fingland
`(//a)[2]` should be used instead. The expression `//a[2]` will select the second `a` child of any parent node, not the second `a` element in the entire document.
markusk
+3  A: 

To select the second a element anywhere in the document:

(//a)[2]

To select the second a element with a particular text in the href attribute:

(//a[@href='...'])[2]

Note that the parantheses are required, and that the expression //a[2] will not do what you intend: it will select all a elements that are the second a element of any parent. If your input is

<p>Link <a href="one.html">One</a></p>
<p>Link <a href="two.html">Two</a> and <a href="three.html">Three</a>.</p>
<p>Link <a href="four.html">Four</a> and <a href="five.html">Five</a>.</p>

(//a)[2] will return the second link (two.html), while //a[2] will return the third and fifth link (three.html and five.html), since these both are the second a child of their parent.

markusk