views:

348

answers:

4

That is, all text and subtags, without the tag of an element itself?

Having

<p>blah <b>bleh</b> blih</p>

I want

blah <b>bleh</b> blih

element.text returns "blah " and etree.tostring(element) returns:

<p>blah <b>bleh</b> blih</p>
+5  A: 

ElementTree works perfectly, you have to assemble the answer yourself. Something like this...

"".join( [ "" if t.text is None else t.text ] + [ xml.tostring(e) for e in t.getchildren() ] )

Thanks to JV amd PEZ for pointing out the errors.


Edit.

>>> import xml.etree.ElementTree as xml
>>> s= '<p>blah <b>bleh</b> blih</p>\n'
>>> t=xml.fromstring(s)
>>> "".join( [ t.text ] + [ xml.tostring(e) for e in t.getchildren() ] )
'blah <b>bleh</b> blih'
>>>

Tail not needed.

S.Lott
Just pointing out a typo - method name - "finall" which I think should have been "findall". Even if findall is used it results in this http://pastebin.com/f6de9a841. Please revise your answer.
JV
@JV: thanks. Fixed.
S.Lott
I'm doing something similar to that, but with a for look. You are actually missing the tail.
J. Pablo Fernández
I like. But it doesn't work with s = '<p></p>'
PEZ
@PEZ: thanks. Fixed.
S.Lott
Why is tail not needed?
Joseph Turian
The tail is the extra whitespace after the closing tag of the construct.
S.Lott
A: 

No idea if an external library might be an option, but anyway -- assuming there is one <p> with this text on the page, a jQuery-solution would be:

alert($('p').html()); // returns blah <b>bleh</b> blih
Till
A: 

I doubt ElementTree is the thing to use for this. But assuming you have strong reasons for using it maybe you could try stripping the root tag from the fragment:

 re.sub(r'(^<%s\b.*?>|</%s\b.*?>$)' % (element.tag, element.tag), '', ElementTree.tostring(element))
PEZ
+1  A: 

This is the solution I ended up using:

def element_to_string(element):
    s = element.text or ""
    for sub_element in element:
        s += etree.tostring(sub_element)
    s += element.tail
    return s
J. Pablo Fernández
That would fail when there's no text or no tail, wouldn't it?
PEZ
PEZ, yes, it fails when there's no text, just found it by running my code and fixed it. I have many instances of no tail and that doesn't fail. Not sure why.
J. Pablo Fernández
Just a nitpick: += on strings is less performant. It's best to accumulate a list of strings and ''.join it at the end.
cdleary