views:

38

answers:

1

I want to find all stylesheet definitions in a XHTML file with lxml.etree.findall. This could be as simple as

elems = tree.findall('link[@rel="stylesheet"]') + tree.findall('style')

But the problem with CSS style definitions is that the order matters, e.g.

<link rel="stylesheet" type="text/css" href="/media/css/first.css" />
<style>body:{font-size: 10px;}</style>
<link rel="stylesheet" type="text/css" href="/media/css/second.css" />

if the contents of the style tag is applied after the rules in the two link tags, the result may be completely different from the one where the rules are applied in order of definition.

So, how would I do a lookup that inlcudes both link[@rel="stylesheet"] and style?

+2  A: 

Possible using XPATH:

data = """<link rel="stylesheet" type="text/css" href="/media/css/first.css" />
<style>body:{font-size: 10px;}</style>
<link rel="stylesheet" type="text/css" href="/media/css/second.css" />
"""

from lxml import etree

h = etree.HTML(data)

h.xpath('//link[@rel="stylesheet"]|//style')

[<Element link at 97a007c>,
 <Element style at 97a002c>,
 <Element link at 97a0054>]
MattH
funny, I tried that expression with `tree.findall`, it didn't occur to me that `tree.xpath` could work. Thanks!
piquadrat
You're welcome! As I was writing I realised that your `findall` expression contained a predicate, which I wasn't aware it would accept. I've not been back to `findall` since becoming familiar with `xpath`.
MattH