views:

242

answers:

3

Hey I have just started to use Python recently and I want to use it with a bit of xPath, the thing is when I print the result of the query I only get [] and I don't know why =S

    import libxml2, urllib


doc = libxml2.parseDoc(urllib.urlopen("http://www.domain.com/").read())
result = doc.xpathEval("//th//td[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a")

if result != []:
    print result
elif result == "":
    print "null"
else:
    print result

doc.freeDoc()

I get no error whatsoever just a []. What could it be? also is there any better documentation for libxml2 than the one here since I find it reaaaally confusing =S


Edit

I changed the code, so now I get more than the [] I get the following output which should be related to the non-validity of the html I'm trying to parse (but it's not mine so I can't modify it). Any ideas on to how to tell Python to be more forgiving with that fact?

^ Entity: line 3552: parser error : Premature end of data in tag tr line 209

^ Entity: line 3552: parser error : Premature end of data in tag tbody line 208

^ Entity: line 3552: parser error : Premature end of data in tag table line 207

^ Entity: line 3552: parser error : Premature end of data in tag input line 206

^ Entity: line 3552: parser error : Premature end of data in tag input line 205

^ Entity: line 3552: parser error : Premature end of data in tag form line 204

^ Entity: line 3552: parser error : Premature end of data in tag table line 99

^ Entity: line 3552: parser error : Premature end of data in tag div line 98

^ Entity: line 3552: parser error : Premature end of data in tag body line 96

^ Entity: line 3552: parser error : Premature end of data in tag html line 3

^ Traceback (most recent call last): File "C:\Python26\lib\site-packages\libxml2.py", line 1263, in parseDoc if ret is None:raise parserError('xmlParseDoc() failed') libxml2.parserError: xmlParseDoc() failed

It's actually a longer list but there's no point in placing it all here, since all errors are due to invalid html.

+1  A: 

It could be that your XPath doesn't select any elements. For example, you are looking for td's inside th's, but those elements are peers, and shouldn't nest.

Why do you say (count(preceding-sibling::*) + 1) = 2 instead of count(preceding-sibling::*) = 1?

If you use a simpler XPath, do you get the results you expect?

Ned Batchelder
for that matter, why not `position()=2`?
outis
I may have complicated myself a bit haha, but the thing is this is the only expression that managed to return me EVERY link inside a table. Otherwise I would just get one or two.
Luis Armando
A: 

Are you confusing th and tr? Change your th to tr.

andrew cooke
no I'm not, again the xPath expression works perfectly, I just can't get anything more than a [] when trying to print it. The odd thing is, the same thin in PHP works just fine, as well as in JAVA
Luis Armando
XPath beginning `//th//td` will select only `td` elements that are descendants of `th` elements. Unless your XHTML has table headers that contain tables, that XPath's not going to return anything.
Robert Rossney
Yes it IS returning the links inside the table I've tested it quite extensively.
Luis Armando
A: 

Side note: Where does all that unnecessary complexity in your XPath come from? This:

//th//td[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a

is equivalent to:

//th//td[count(preceding-sibling::*) = 1)]//a

and very probably even to:

//th/td[2]//a
Tomalak