Been banging my head on this for a while and I've read a bunch of articles and the issue isn't any clearer. I have a bunch of strings stored in my database, imagine the following:
x = '\xd0\xa4'
y = '\x92'
At the Python shell I get the following:
print x
Ф
print y
?
Which is exactly what I want to see. However then there is the following:
print unicode(x, 'utf8')
Ф
But not this:
unicode(y, 'utf8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 0: unexpected code byte
My feeling is that our strings are getting mangled because Django tries to convert them to unicode, but I'm just guessing at this point. Any insights or workarounds appreciated.
UPDATE: When I look at the database at the row that contains the '\x92' value, I see this character as ’. An apostrophe. I'm viewing the contents of the database using a Unicode UTF-8 encoding.