The following Python code ...
html_data = urllib2.urlopen(some_url).read()
f = codecs.open(filename, 'w', encoding='utf-8')
f.write(html_data)
f.close()
... sometimes fails with UnicodeDecodeError
...
File "/.../lib/python2.6/codecs.py", line 686, in write
return self.writer.write(data)
File "/.../lib/python2.6/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 5605: ordinal not in range(128)
My questions:
- How do I make sure my
urllib2.urlopen(some_url).read()
call always returns UTF-8? - Is there anything wrong with my
codecs.open(...)
call that prevents it from storing my data to disk in UTF-8 encoding?