views:

200

answers:

2

Is there an equivalent of Beautiful Soup's tag.renderContents() method in lxml?

I've tried using element.text, but that doesn't render child tags, as well as ''.join(etree.tostring(child) for child in element), but that doesn't render child text. The closest I've been able to find is etree.tostring(element), but that renders the opening and closing tags of element, which I do not want.

Is there another method I'm overlooking (or an alternative approach to accomplish this)?

A: 

One hackish solution:

from lxml import etree
def render_contents(element):
    """
    Surely there is a safe lxml built-in for this...
    """
    tagname = element.tag
    return re.sub('</%s>\s*$' % tagname, '',
                  re.sub(r'^<%s(\s+\w+=".*?")*?>' % tagname,
                         '', etree.tostring(element))).strip()

Edit

Really, there's no better method than this?

meeselet
+1  A: 

You're most of the way there with your original idea. element.text gives you the first text child of the element, and your list comprehension gives you everything else. If you concatenate the two strings together, you get what you're looking for:

>>> xmlstr = "<sec>header <p>para 0</p> text <p>para 1</p> footer</sec>"
>>> element = etree.fromstring(xmlstr)
>>>
>>> element.text + "".join(map (etree.tostring, element))
'header <p>para 0</p> text <p>para 1</p> footer'
>>>

Ari.

iter