I am fetching a webpage (http://autoweek.com) and trying to process it but getting encoding error. Autoweek declares "iso-8859-1" encoding and has the word "Nürburgring" (u with umlaut)
I do:
# -*- encoding: utf-8 -*-
import urllib
webpage = urllib.urlopen(feed.crawl_url).read()
webpage.decode("utf-8")
it gives me the following error:
'utf8' codec can't decode bytes in position 7768-7773: unsupported Unicode code range"
if I bypass .decode step and do some parsing with lxml library, it raises an error when I am saving parsed title to database:
'utf8' codec can't decode bytes in position 45-50: unsupported Unicode code range
My database has character set utf8 and collation utf-general-ci
My settings:
Django
Python 2.4.3
MySQL 5.0.22
MySQL-python 1.2.1
mod_python 3.2.8