ansaurus

Question

Answer 1

+2 A:

UnicodeEncodeError: 'ascii' codec

write is trying to encode the string using the ascii codec (which doesn't have a way of encoding accented characters like é or à.

Instead use

import codecs
with codecs.open("/tmp/test.txt",'w',encoding='utf-8') as f:   
    f.write(all.decode('utf-8'))

or choose some other codec (like cp1252) which can encode the characters in your string.

PS. all.decode('utf-8') was used above because f.write expects a unicode string. Better than using all.decode('utf-8') would be to convert all your strings to unicode early, work in unicode, and encode to a specific encoding like 'utf-8' late -- only when you have to.

PPS. It looks like names might already be a list of unicode strings. In that case, define delimiter to be a unicode string too: delimiter = u';', so all will be a unicode string. Then

with codecs.open("/tmp/test.txt",'w',encoding='utf-8') as f:   
    f.write(all)

should work (unless there is some issue with Python 2.4 that I'm not aware of.)

If 'utf-8' does not work, remember to try other encodings that contain the characters you need, and that your computer knows about. On Windows, that might mean 'cp1252'.

unutbu 2010-05-27 20:51:33

Is the with statement available in Python 2.4?

Somebody still uses you MS-DOS 2010-05-27 21:02:37

@Somebody: Unfortunately, no. It was implemented in Python 2.5. If you are using Python 2.4, you have no choice but to use `f = open("/tmp/test.txt", "w")`.

unutbu 2010-05-27 21:11:22

How do I convert all my strings to unicode early, If I receive it from a method I don't have control? I already receive a "José Carlos" in a variable, not a string literal. When I try to do unicode(all, "utf-8") I get "TypeError: decoding Unicode is not supported"...

Somebody still uses you MS-DOS 2010-05-27 21:28:17

I was just trying to describe my setup, thanks for your "convert all strings before suggestion"...

Somebody still uses you MS-DOS 2010-05-27 21:28:59

@unutbu: I don't know if I didt something wrong before, but adding u' do the delimiter and to the \n's, and removing .decode from all, it worked. I opened my file after the "exporting", and it worked. Thanks for your help. I'm still not expert at this subject, but your explanation about this issues can be a start. I learned a lot from this resource http://www.red-mercury.com/blog/eclectic-tech/python-unicode-fixing-utf-8-encoded-as-latin-1-iso-8859-1/ as well.

Somebody still uses you MS-DOS 2010-05-28 13:29:23

Answer 2

A:

You told Python to print all, but since all has no fixed computer representation, Python first had to convert all to some printable form. Since you didn't tell Python how to do the conversion, it assumed you wanted ASCII. Unfortunately, ASCII can only handle values from 0 to 127, and all contains values out of that range, hence you see an error.

To fix this use:

all = "José Carlos;Jonas;Natália Juan;John"
import codecs
f = codecs.open("/tmp/test.txt", "w", "utf-8")
f.write(all.decode("utf-8"))
f.close()

Babak Ghahremanpour 2010-05-27 21:01:03

This doesnt work... UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Somebody still uses you MS-DOS 2010-05-27 21:11:40

I added the call to decode and then cut and pasted my code into Python 2.5, in interactive mode, running on my Mac (Mac OS X 10.5.8). It worked perfectly. Are you still having problems? Even if you call decode first?

Babak Ghahremanpour 2010-05-28 02:33:20

The OP started with a Unicode string, so 'all' above should be 'all=u"..."'. Then just 'f.write(all)' and the codec will *encode* the Unicode string to the file.

Mark Tolonen 2010-05-28 02:36:13

@babak; -1 for TWO reasons (1) read Mark's comment (2) your `all` is a str object encoded in who-knows-what encoding; 'utf8' on yours, cp850 in a Western-European-language Windows system, ... copy/paste into the (Command Prompt) interactive interpreter on my box dies in the utf8 decode; it's an *accident* that the OP is on a *x platform and thus the standard encoding is presumably utf8.

John Machin 2010-05-28 05:47:54

ansaurus

tags:

views:

answers:

Python unicode problem

related questions