I am trying to parse a file encoded in utf-8. No operation has problem apart from write to file (or at least I think so). A minimum working example follows:
from lxml import etree
parser = etree.HTMLParser()
tree = etree.parse('example.txt', parser)
tree.write('aaaaaaaaaaaaaaaaa.html')
example.txt:
<html>
    <body>
        <invalid html here/>
        <interesting attrib1="yes">
            <group>
                <line>
                    δεδομένα1
                </line>
            </group>
            <group>
                <line>
                    δεδομένα2
                </line>
            </group>
            <group>
                <line>
                    δεδομένα3
                </line>
            </group>
        </interesting>
    </body>
</html> 
I am already aware of a similar previous question but I could not solve the problem either without specifying the output encoding, or using utf8 or iso-8859-7.
I have concluded that the file is in utf8 since it displays correctly at Chrome when choosing this encoding. My editor (Kate) agrees.
I get no runtime error, but the output is not as desired.
Example output with tree.write('aaaaaaaaaaaaaaaaa.html', encoding='utf-8'):
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
        <invalid html="" here=""/><interesting attrib1="yes"><group><line>
                    δεδομÎνα1
                </line></group><group><line>
                    δεδομÎνα2
                </line></group><group><line>
                    δεδομÎνα3
                </line></group></interesting></body></html>