You cannot parse from unicode strings AND have an encoding declaration in the string.
So, either you make it an encoded string (as you apparently can't store it as a string, you will have to re-encode it before parsing. Or you serialize the tree as unicode with lxml yourself: etree.tostring(tree, encoding=unicode)
, WITHOUT xml declaration. You can easily parse the result again with etree.fromunicode
see http://codespeak.net/lxml/parsing.html#python-unicode-strings
Edit: If, apparently, you already have the unicode string, and can't control how that was made. You'll have to encode it again, and provide the parser with the encoding you used:
utf8_parser = etree.XMLParser(encoding='utf-8')
def parse_from_unicode(unicode_str):
s = unicode_str.encode('utf-8')
return etree.fromstring(s, parser=utf8_parser)
This will make sure that, whatever was inside the xml declaration gets ignored, because the parser will always use utf-8.