views:

85

answers:

1

hi.

using python 2.4/2.5, with libxm2dom.

trying to get my haed around a question/issue that I'm considering.

I have a doc I import the doc, and can build the DOM (libxml2dom)

I'm trying to figure out if there's a way to programatically "search" for a given term, and be able to craft the XPath function to extract the href for the term.

ie, If I have a doc, and it has a chunk of html like..

.
.
.
<a href="dog">bigdog</a>
.
.
.

I'd like to be able to say, have a XPath function that would search/find bigdog, and return the XPath to get the href link.

i can easily manually examine/analyze the content of the page and create the XPath to get the href/link... but I'm wondering how the heck one would go about the other way.. and if it's possible.

thanks

A: 

This XPATH will select the @href of the a element who's text is "bigdog".

//a[text()='bigdog']/@href
Mads Hansen
hey mads,thanks.. should have been more clear. i was actually looking for a way to accomplish a regex, and to be able to create an XPath which would then get them element based on the regex function...something like //a[text()='regex()']/@href
tom smith
libxml2dom only supports XPATH 1.0, so XPATH 2.0 functions like matches() (which support regex) are out. Not sure if it has support for EXSLT. If so, then you could use REGEX in the exslt:match() function. http://www.exslt.org/regexp/functions/match/index.html Apparently, lxml supports EXSLT extensions, so you could use that if needed.
Mads Hansen