That is, all text and subtags, without the tag of an element itself?
Having
<p>blah <b>bleh</b> blih</p>
I want
blah <b>bleh</b> blih
element.text returns "blah " and etree.tostring(element) returns:
<p>blah <b>bleh</b> blih</p>
That is, all text and subtags, without the tag of an element itself?
Having
<p>blah <b>bleh</b> blih</p>
I want
blah <b>bleh</b> blih
element.text returns "blah " and etree.tostring(element) returns:
<p>blah <b>bleh</b> blih</p>
ElementTree works perfectly, you have to assemble the answer yourself. Something like this...
"".join( [ "" if t.text is None else t.text ] + [ xml.tostring(e) for e in t.getchildren() ] )
Thanks to JV amd PEZ for pointing out the errors.
Edit.
>>> import xml.etree.ElementTree as xml
>>> s= '<p>blah <b>bleh</b> blih</p>\n'
>>> t=xml.fromstring(s)
>>> "".join( [ t.text ] + [ xml.tostring(e) for e in t.getchildren() ] )
'blah <b>bleh</b> blih'
>>>
Tail not needed.
No idea if an external library might be an option, but anyway -- assuming there is one <p>
with this text on the page, a jQuery-solution would be:
alert($('p').html()); // returns blah <b>bleh</b> blih
I doubt ElementTree is the thing to use for this. But assuming you have strong reasons for using it maybe you could try stripping the root tag from the fragment:
re.sub(r'(^<%s\b.*?>|</%s\b.*?>$)' % (element.tag, element.tag), '', ElementTree.tostring(element))
This is the solution I ended up using:
def element_to_string(element):
s = element.text or ""
for sub_element in element:
s += etree.tostring(sub_element)
s += element.tail
return s