I'm using Python Google App Engine to simply fetch html pages and show it. My aim is to be able to fetch any page in any language. Now I have a problem with encoding:
Simple
result = urllib2.urlopen(url).read()
leaves artifacts in place of special letters and
urllib2.urlopen(url).read().decode('utf8')
throws error:
'utf8' codec can't decode bytes in position 3544-3546: invalid data
So how to solve it? Is there any lib that would check what encoding page is and convert so it would be readable?