ansaurus

Question

Decoding not reversing unicode encoding in Django/Python

Answer 1

+1 A:

Wrong parameter name? From the doc, I can see the keyword argument name is supposed to be encoding and not coding.

Clément 2010-04-12 20:54:56

i edited the question, but I use the right param name. just solve the problem. for some reason, in the shell i get >>> u'catégorie'.encode('utf-8')'cat\xc3\xa9gorie'>>> 'cat\xc3\xa9gorie'.decode('utf-8')u'cat\xe9gorie'So I was worried that the string would not be ouputted correctly to xml. wrong assumption.

PhilGo20 2010-04-12 21:09:13

Answer 2

+3 A:

The coding header in your source file tells Python what encoding your source is in. It's the encoding Python uses to decode the source of the unicode string literal (u"Par Catégorie") into a unicode object. The unicode object itself has no encoding; it's raw unicode data. (Internally, Python will use one of two encodings, depending on how it was configured, but Python code shouldn't worry about that.)

The UnicodeDecodeError you get means that somewhere, you are mixing unicode strings and bytestrings (normal strings.) When mixing them together (concatenating, performing string interpolation, et cetera) Python will try to convert the bytestring into a unicode string by decoding the bytestring using the default encoding, ASCII. If the bytestring contains non-ASCII data, this will fail with the error you see. The operation being done may be in a library somewhere, but it still means you're mixing inputs of different types.

Unfortunately the fact that it'll work just fine as long as the bytestrings contain just ASCII data means this type of error is all too frequent even in library code. Python 3.x solves that problem by getting rid of the implicit conversion between unicode strings (just str in 3.x) and bytestrings (the bytes type in 3.x.)

Thomas Wouters 2010-04-12 21:41:28

bingo. I was indeed mixing a bytestring and unicode string somewhere. I guess I should always use unicode strings.Thanks for the clear explanation.

PhilGo20 2010-04-13 20:09:28

ansaurus

tags:

views:

answers:

Decoding not reversing unicode encoding in Django/Python

related questions