My XML file looks like this:
<strings>
<string>Bla <b>One & Two</b> Foo</string>
</strings>
I want to extract the content of each <string> while maintaining the inner tags. That is, I would like to see the following Python string: u"Bla <b>One & Two</b> Foo". Alternatively, I guess I could settle on u"Bla <b>One & Two</b> Foo", and then try to replace the entities myself.
I am currently using lxml, which allows me to iterate over the nested tags, missing out on the text not inside a tag, or alternatively over all text content (itertext), losing the tag information. I'm probably missing something.
If possible I'd prefer to keep lxml, though I can switch to another library if necessary.