ansaurus

Question

exceptions with python unicode encode/decode functions (why doesn't errors=ignore actually ignore them??)

Answer 1

A:

The write method (in Python 2) takes a unicode object, and you're passing it a str -- so the encode call in codecs.py line 351 is first trying to build a unicode object (with the default codec, 'ascii'). Fix is easy: change the write call to

write(u'кошка')

The u prefix tells Python you're using a Unicode object, and it should be fine.

Alex Martelli 2010-04-21 02:39:24

Answer 2

+1 A:

problem is here ===>>>> write('кошка')

You are writing a str object, the recipient is expecting a unicode object, so it tries to convert it to unicode using the default encoding (ascii), which of course (?) produces the well-known (?) UnicodeDecodeError: 'ascii' codec can't decode byte 0xXX in position 0: ordinal not in range(128)

The whole point of using the codecs module like that is to get it to convert your unicode objects to utf8-encoded on the fly -- so feed it unicode

Update How to convert the literal or non-literal:

unicode_object = literal_or_whatever.decode("UNKNOWN_ENCODING")

Do you know how your literal is encoded? Would you like to tell us what you are trying to accomplish? A one liner with python -c isn't much help ;-)

John Machin 2010-04-21 02:41:34

the unicode() function doesn't work, it throws the same exception.

gatoatigrado 2010-04-21 02:56:17

@gatoatigrado: I said to feed it unicode; I didn't say to use the `unicode()` function. If you use `unicode()` without specifying an encoding, OF COURSE it will get the same exception (default encoding is ascii). And please read the last sentence of my answer.

John Machin 2010-04-21 03:09:53

ah, sorry, I was scanning, sorry. The question was "how" do I feed it unicode from a Python string. The logical way would be `unicode(pystr)`, but that doesn't work. I think I've used dumb tricks with bytes() before, but I'd like to know what the real solution is.

gatoatigrado 2010-04-21 03:20:25

@gatoatigrado: unicode(str_object) as already explained is NOT the "logical" way (whatever that means) -- you need to know what encoding (e.g. "cp1252") your str_object is in, and then do `str_object.encode(that_encoding)` or `unicode(str_object, that_encoding)` (these are equivalent)

John Machin 2010-04-21 09:34:46

Answer 3

+1 A:

a non-solution (from question author) I just found out: use python3

python3 -c "import codecs; codecs.open('tmp', 'wb', encoding='utf8', errors='ignore').write('кошка')"

gatoatigrado 2010-04-21 02:55:12

Yes, Python 3 has native unicode support. =]

Xavier Ho 2010-04-21 14:04:55

Answer 4

+1 A:

In Python 2.x use write('кошка'.decode('utf-8') instead of write('кошка').

You can use other encoding too instead of 'utf-8'.

Hopefully it will not throw any error ...

aberry 2010-04-21 04:32:36

ansaurus

tags:

views:

answers:

exceptions with python unicode encode/decode functions (why doesn't errors=ignore actually ignore them??)

related questions