tags:

views:

209

answers:

1

hi...

quick question... i can create/parse a chunk of html using libxml2dom, etc...

however, is there a way to somehow display the xpath used to generate/extract the html chunk.. i'm assuming that there's some method/way of doing this that i can't find..

ex:

import libxml2dom
d = libxml2dom.parseString(s, html=1)

## 

hdr="//div[3]/table[1]/tr/th"

thdr_ = d.xpath(hdr)
print "lent = ",len(thdr_)

at this point, thdr_ is an array/list of objects.. each of which points to a chunk of html (if you will)

i'm trying to figure out if there's a way to get, say, the xpath for say, the thdr_[x] element/item of the list...

ie:

thdr_[0]=//div[3]/table[1]/tr[0]/th
thdr_[1]=//div[3]/table[1]/tr[1]/th
thdr_[2]=//div[3]/table[1]/tr[2]/th
.
.
.

any thoughts/comments..

thanks

-tom

A: 

I did this by iterating each node and comparing the textContent with my expected text. For fuzzy comparisons I used the SequenceMatcher class from difflib.

Plumo