views:

198

answers:

3

I have a problem when I'm printing (or writing to a file) the non-ASCII characters in Python. I've resolved it by overriding the str method in my own objects, and making "x.encode('utf-8')" inside it, where x is a property inside the object.

But, if I receive a third-party object, and I make "str(object)", and this object has a non-ASCII character inside, it will fail.

So the question is: is there any way to tell the str method that the object has an UTF-8 codification, generically? I'm working with Python 2.5.4.

+3  A: 

How about you use unicode(object) and define __unicode__ method on your classes?

Then you know its unicode and you can encode it anyway you want into to a file.

Kugel
Yep. You want unicode strings or py3k.
Paul McMillan
But then I'm in the same problem: if I receive a third party object and I use "unicode(object)", and the object has a non-ascii character, it will fail, won't it?
Roman
Besides, when I use "print(object)", internally it calls str method, so I can't use unicode
Roman
One more question: if I use python 3, Won't I have those problems? Python3 makes the conversion alone? Does it accept non-ascii characters by default?
Roman
All Python 3 strings are (what used to be) unicode by default.
mavnn
First, please realize, if you receive and array of bytes, witch python strings essetialy are, there is no way to be sure what encoding it is in. If there are third-party objects that give you strings in non-standard encoding, they should also provide which encoding it is in.
Kugel
+2  A: 

There is no way to make str() work with Unicode in Python < 3.0.

Use repr(obj) instead of str(obj). repr() will convert the result to ASCII, properly escaping everything that isn't in the ASCII code range.

Other than that, use a file object which allows unicode. So don't encode at the input side but at the output side:

fileObj = codecs.open( "someFile", "w", "utf-8" )

Now you can write unicode strings to fileObj and they will be converted as needed. To make the same happen with print, you need to wrap sys.stdout:

import sys, codecs, locale
print str(sys.stdout.encoding)
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
line = u"\u0411\n"
print type(line), len(line)
sys.stdout.write(line)
print line
Aaron Digulla
But I have the same problem when I use print(object), because internally it calls to str, so if the object has a non-ascii character it will fail.I've seen that I can put this in the first line of my files.py:# -*- coding: utf-8 -*-but it doesn't work
Roman
The encoding of the source file has nothing to do with what `str()` supports. `str()` only supports unicode characters in py3k, so either use repr() or unicode() everywhere.
Aaron Digulla
A: 

I would like to say that I've found a solution in Unix systems, exporting a environment var, with this:

export LC_CTYPE="es:ES.UTF-8"

This way, all files are in utf-8, so I can make prints or whatever and it works fine

Roman
What does this have to do with your question? Or with python?
Kugel