views:

608

answers:

1

I got the following error when tried to add an entry to a Django model via generic relations.

django.utils.encoding.DjangoUnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 24: unexpected code byte. You passed in 'ASL/60Styles_Timeless-3_\xb8 CaLe.asl' (<type 'str'>)

The model is like this:

class MD5(models.Model):
    value = models.CharField(max_length=32, db_index=True)
    filename = models.CharField(max_length=100)
    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    content_object = generic.GenericForeignKey()

Table's charset is utf8 and collation is utf8_general_ci.

Does it mean that the filename is not a valid utf8 string? How to fix this error or can we convert the invalid string to a valid format?

+2  A: 

Your file system is apparently not using UTF-8 encoding:

>>> a = 'ASL/60Styles_Timeless-3_\xb8 CaLe.asl'
>>> print a.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 24: unexpected code byte
>>> a.decode('iso8859-2')
u'ASL/60Styles_Timeless-3_\xb8 CaLe.asl'
>>> print a.decode('iso8859-2')
ASL/60Styles_Timeless-3_¸ CaLe.asl

Only now I've realized that the string you got is actually already unicode. Try using this to get unicode:

>>> a.decode('raw_unicode_escape')
u'ASL/60Styles_Timeless-3_\xb8 CaLe.asl'
it's running on ubuntu. system locale has been set to en_US.UTF-8. how to see current file system encoding and how to set it to utf8?
jack
I've edited to add a possible solution. Hope this helps.