ansaurus

Question

Why doesn't xpath work when processing an XHTML document with lxml (in python)?

Answer 1

+7 A:

The problem is the namespaces. When parsed as XML, the img tag is in the http://www.w3.org/1999/xhtml namespace since that is the default namespace for the element. You are asking for the img tag in no namespace.

Try this:

>>> tree.getroot().xpath(
...     "//xhtml:img", 
...     namespaces={'xhtml':'http://www.w3.org/1999/xhtml'}
...     )
[<Element {http://www.w3.org/1999/xhtml}img at 11a29e0>]

Ned Batchelder 2008-11-17 22:45:15

Quoting from http://codespeak.net/lxml/xpathxslt.html<<Optionally, you can provide a namespaces keyword argument, which should be a dictionary mapping the namespace prefixes used in the XPath expression to namespace URIs>>

Cristian Ciupitu 2009-06-14 14:50:07

Answer 2

+4 A:

XPath considers all unprefixed names to be in "no namespace".

In particular the spec says:

"A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). "

See those two detailed explanations of the problem and its solution: here and here. The solution is to associate a prefix (with the API that's being used) and to use it to prefix any unprefixed name in the XPath expression.

Hope this helped.

Cheers,

Dimitre Novatchev

Dimitre Novatchev 2008-11-17 23:13:07

ansaurus

tags:

views:

answers:

Why doesn't xpath work when processing an XHTML document with lxml (in python)?

related questions