ansaurus

Question

How do I get the whole text of an element using ElementTree?

Answer 1

+5 A:

ElementTree works perfectly, you have to assemble the answer yourself. Something like this...

"".join( [ "" if t.text is None else t.text ] + [ xml.tostring(e) for e in t.getchildren() ] )

Thanks to JV amd PEZ for pointing out the errors.

Edit.

>>> import xml.etree.ElementTree as xml
>>> s= '<p>blah <b>bleh</b> blih</p>\n'
>>> t=xml.fromstring(s)
>>> "".join( [ t.text ] + [ xml.tostring(e) for e in t.getchildren() ] )
'blah <b>bleh</b> blih'
>>>

Tail not needed.

S.Lott 2008-12-19 11:21:52

Just pointing out a typo - method name - "finall" which I think should have been "findall". Even if findall is used it results in this http://pastebin.com/f6de9a841. Please revise your answer.

JV 2008-12-19 11:45:49

@JV: thanks. Fixed.

S.Lott 2008-12-19 12:19:25

I'm doing something similar to that, but with a for look. You are actually missing the tail.

J. Pablo Fernández 2008-12-19 17:26:25

I like. But it doesn't work with s = '<p></p>'

PEZ 2008-12-19 20:43:00

@PEZ: thanks. Fixed.

S.Lott 2008-12-19 20:48:56

Why is tail not needed?

Joseph Turian 2010-01-22 20:05:03

The tail is the extra whitespace after the closing tag of the construct.

S.Lott 2010-01-22 20:24:40

Answer 2

A:

No idea if an external library might be an option, but anyway -- assuming there is one <p> with this text on the page, a jQuery-solution would be:

alert($('p').html()); // returns blah <b>bleh</b> blih

Till 2008-12-19 11:23:59

Answer 3

A:

I doubt ElementTree is the thing to use for this. But assuming you have strong reasons for using it maybe you could try stripping the root tag from the fragment:

 re.sub(r'(^<%s\b.*?>|</%s\b.*?>$)' % (element.tag, element.tag), '', ElementTree.tostring(element))

PEZ 2008-12-19 11:56:30

Answer 4

+1 A:

This is the solution I ended up using:

def element_to_string(element):
    s = element.text or ""
    for sub_element in element:
        s += etree.tostring(sub_element)
    s += element.tail
    return s

J. Pablo Fernández 2008-12-19 17:27:09

That would fail when there's no text or no tail, wouldn't it?

PEZ 2008-12-19 20:35:34

PEZ, yes, it fails when there's no text, just found it by running my code and fixed it. I have many instances of no tail and that doesn't fail. Not sure why.

J. Pablo Fernández 2008-12-20 17:02:04

Just a nitpick: += on strings is less performant. It's best to accumulate a list of strings and ''.join it at the end.

cdleary 2008-12-20 22:36:44

ansaurus

tags:

views:

answers:

How do I get the whole text of an element using ElementTree?

related questions