tags:

views:

2263

answers:

4

I have a well formed XHTML page. I want to find the destination URL of a link when I have the text that is linked.

Example

<a href="http://stackoverflow.com"&gt;programming questions site</a>
<a href="http://cnn.com"&gt;news&lt;/a&gt;

I want an XPath expression such that if given programming questions site it will give http://stackoverflow.com and if I give it news it will give http://cnn.com.

+7  A: 

Should be something similar to this: "//a[text()='text_i_want_to_find']/@href"

Badaro
will I ever learn xpath? when I see a query it is so obvious and easy to understand... but I am never able to write one on my own
flybywire
+3  A: 
//a[text()='programming quesions site']/@href

which basically identifies an anchor node <a> that has the text you want, and extracts the href attribute.

Brian Agnew
A: 

Think of the phrase in the square brackets as a WHERE clause in SQL.

So this query says, "select the "href" attribute (@) of an "a" tag that appears anywhere (//), but only where (the bracketed phrase) the textual contents of the "a" tag is equal to 'programming questions site'".

Brian Travis
A: 

I came across this trying to figure out how to remove links from a page based on the text, but the example above only works if the text match is exact. What would an example look like if you wanted a partial match (such as "programming" in the example above), or if you were comparing the text to a pre-determined string of keywords to match on?