ansaurus

Question

Answer 1

+2 A:

Looks like it's auto-decoding using latin1. To fix:

>>> title = u'\xb5\xb1\xc4\xe3\xb9\xc2\xb5\xa5\xc4\xe3\xbb\xe1\xcf\xeb\xc6\xf0\xcb\xad'
>>> print title.encode('latin1').decode('GBK')
当你孤单你会想起谁

Tested in Python 2.x but should work fine in 3 as well.

Max Shawabkeh 2010-02-03 09:41:25

Answer 2

+4 A:

It looks like the string has been decoded to unicode using the wrong encoding (latin-1).

You need to encode it to a byte string and then decode it back to unicode using the correct encoding.

title = u'\xb5\xb1\xc4\xe3\xb9\xc2\xb5\xa5\xc4\xe3\xbb\xe1\xcf\xeb\xc6\xf0\xcb\xad'
print title.encode('latin-1').decode('gbk')
当你孤单你会想起谁

DisplacedAussie 2010-02-03 09:46:12

how to correct the misencoded string?