views:

72

answers:

2

I'm receiving some data from a ZODB (Zope Object Database). I receive a mybrains object. Then I do:

o = mybrains.getObject()

and I receive a "Person" object in my project. Then, I can do

b = o.name

and doing print b on my class I get:

José Carlos

and print b.name.__class__

<type 'unicode'>

I have a lot of "Person" objects. They are added to a list.

names = [o.nome, o1.nome, o2.nome]

Then, I trying to create a text file with this data.

delimiter = ';'
all = delimiter.join(names) + '\n'

No problem. Now, when I do a print all I have:

José Carlos;Jonas;Natália
Juan;John

But when I try to create a file of it:

f = open("/tmp/test.txt", "w")
f.write(all)

I get an error like this (the positions aren't exaclty the same, since I change the names)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 84: ordinal not in range(128)

If I can print already with the "correct" form to display it, why I can't write a file with it? Which encode/decode method should I use to write a file with this data?

I'm using Python 2.4.5 (can't upgrade it)

+2  A: 

UnicodeEncodeError: 'ascii' codec

write is trying to encode the string using the ascii codec (which doesn't have a way of encoding accented characters like é or à.

Instead use

import codecs
with codecs.open("/tmp/test.txt",'w',encoding='utf-8') as f:   
    f.write(all.decode('utf-8'))

or choose some other codec (like cp1252) which can encode the characters in your string.

PS. all.decode('utf-8') was used above because f.write expects a unicode string. Better than using all.decode('utf-8') would be to convert all your strings to unicode early, work in unicode, and encode to a specific encoding like 'utf-8' late -- only when you have to.

PPS. It looks like names might already be a list of unicode strings. In that case, define delimiter to be a unicode string too: delimiter = u';', so all will be a unicode string. Then

with codecs.open("/tmp/test.txt",'w',encoding='utf-8') as f:   
    f.write(all)

should work (unless there is some issue with Python 2.4 that I'm not aware of.)

If 'utf-8' does not work, remember to try other encodings that contain the characters you need, and that your computer knows about. On Windows, that might mean 'cp1252'.

unutbu
Is the with statement available in Python 2.4?
Somebody still uses you MS-DOS
@Somebody: Unfortunately, no. It was implemented in Python 2.5. If you are using Python 2.4, you have no choice but to use `f = open("/tmp/test.txt", "w")`.
unutbu
How do I convert all my strings to unicode early, If I receive it from a method I don't have control? I already receive a "José Carlos" in a variable, not a string literal. When I try to do unicode(all, "utf-8") I get "TypeError: decoding Unicode is not supported"...
Somebody still uses you MS-DOS
I was just trying to describe my setup, thanks for your "convert all strings before suggestion"...
Somebody still uses you MS-DOS
@unutbu: I don't know if I didt something wrong before, but adding u' do the delimiter and to the \n's, and removing .decode from all, it worked. I opened my file after the "exporting", and it worked. Thanks for your help. I'm still not expert at this subject, but your explanation about this issues can be a start. I learned a lot from this resource http://www.red-mercury.com/blog/eclectic-tech/python-unicode-fixing-utf-8-encoded-as-latin-1-iso-8859-1/ as well.
Somebody still uses you MS-DOS
A: 

You told Python to print all, but since all has no fixed computer representation, Python first had to convert all to some printable form. Since you didn't tell Python how to do the conversion, it assumed you wanted ASCII. Unfortunately, ASCII can only handle values from 0 to 127, and all contains values out of that range, hence you see an error.

To fix this use:

all = "José Carlos;Jonas;Natália Juan;John"
import codecs
f = codecs.open("/tmp/test.txt", "w", "utf-8")
f.write(all.decode("utf-8"))
f.close()
Babak Ghahremanpour
This doesnt work... UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
Somebody still uses you MS-DOS
I added the call to decode and then cut and pasted my code into Python 2.5, in interactive mode, running on my Mac (Mac OS X 10.5.8). It worked perfectly. Are you still having problems? Even if you call decode first?
Babak Ghahremanpour
The OP started with a Unicode string, so 'all' above should be 'all=u"..."'. Then just 'f.write(all)' and the codec will *encode* the Unicode string to the file.
Mark Tolonen
@babak; -1 for TWO reasons (1) read Mark's comment (2) your `all` is a str object encoded in who-knows-what encoding; 'utf8' on yours, cp850 in a Western-European-language Windows system, ... copy/paste into the (Command Prompt) interactive interpreter on my box dies in the utf8 decode; it's an *accident* that the OP is on a *x platform and thus the standard encoding is presumably utf8.
John Machin