ansaurus

Question

Unable to decode unicode string in Python 2.4

Answer 1

+2 A:

You need to use "ISO-8859-1":

Name = 'w\xf6rner'.decode('iso-8859-1')
file.write('Name: %s - %s\n' %(Name, type(Name)))

utf-8 uses 2 bytes for escaping anything outside ascii, but here it's just 1 byte, so iso-8859-1 is probably correct.

Staale 2009-03-20 14:41:06

Answer 2

+3 A:

Your string is not in UTF8 encoding. If you want to 'decode' string to unicode, your string must be in encoding you specified by parameter. I tried this and it works perfectly:

print 'w\xf6rner'.decode('cp1250')

EDIT

For writing unicode strings to the file you can use codecs module:

import codecs
f = codecs.open("yourfile.txt", "w", "utf8")
f.write( ... )

It is handy to specify encoding of the input/output and using 'unicode' string throughout your code without bothering of different encodings.

Jiri 2009-03-20 14:43:51

Answer 3

+2 A:

It's obviously 1-byte encoding. 'ö' in UTF-8 is '\xc3\xb6'.

The encoding might be:

ISO-8859-1
ISO-8859-2
ISO-8859-13
ISO-8859-15
Win-1250
Win-1252

vartec 2009-03-20 14:55:11

Answer 4

+1 A:

So in my code to reproduce I changed '\xf6' to '\xc3\xb6', and the failure still occurs

Not in the first line it doesn't:

>>> 'w\xc3\xb6rner'.decode('utf-8')
u'w\xf6rner'

The second line will error out though:

>>> file.write('Name: %s - %s\n' %(Name, type(Name)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 7: ordinal not in range(128)

Which is entirely what you'd expect, trying to write non-ASCII Unicode characters to a byte stream. If you use Jiri's suggestion of a codecs-wrapped stream you can write Unicode directly, otherwise you will have to re-encode the Unicode string into bytes manually.

Better, for logging purposes, would be simply to spit out a repr() of the variable. Then you don't have to worry about Unicode characters being in there, or newlines or other unwanted characters:

name= 'w\xc3\xb6rner'.decode('utf-8')
file.write('Name: %r\n' % name)

Name: u'w\xf6rner'

bobince 2009-03-20 16:01:34

ansaurus

tags:

views:

answers:

Unable to decode unicode string in Python 2.4

related questions