ansaurus

Question

How to print non-ASCII characters in Python

Answer 1

+3 A:

How about you use unicode(object) and define __unicode__ method on your classes?

Then you know its unicode and you can encode it anyway you want into to a file.

Kugel 2009-11-10 10:51:40

Yep. You want unicode strings or py3k.

Paul McMillan 2009-11-10 10:53:19

But then I'm in the same problem: if I receive a third party object and I use "unicode(object)", and the object has a non-ascii character, it will fail, won't it?

Roman 2009-11-10 10:58:19

Besides, when I use "print(object)", internally it calls str method, so I can't use unicode

Roman 2009-11-10 11:01:23

One more question: if I use python 3, Won't I have those problems? Python3 makes the conversion alone? Does it accept non-ascii characters by default?

Roman 2009-11-10 11:24:54

All Python 3 strings are (what used to be) unicode by default.

mavnn 2009-11-10 12:22:30

First, please realize, if you receive and array of bytes, witch python strings essetialy are, there is no way to be sure what encoding it is in. If there are third-party objects that give you strings in non-standard encoding, they should also provide which encoding it is in.

Kugel 2009-11-10 18:42:35

Answer 2

+2 A:

There is no way to make str() work with Unicode in Python < 3.0.

Use repr(obj) instead of str(obj). repr() will convert the result to ASCII, properly escaping everything that isn't in the ASCII code range.

Other than that, use a file object which allows unicode. So don't encode at the input side but at the output side:

fileObj = codecs.open( "someFile", "w", "utf-8" )

Now you can write unicode strings to fileObj and they will be converted as needed. To make the same happen with print, you need to wrap sys.stdout:

import sys, codecs, locale
print str(sys.stdout.encoding)
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
line = u"\u0411\n"
print type(line), len(line)
sys.stdout.write(line)
print line

Aaron Digulla 2009-11-10 11:05:28

But I have the same problem when I use print(object), because internally it calls to str, so if the object has a non-ascii character it will fail.I've seen that I can put this in the first line of my files.py:# -*- coding: utf-8 -*-but it doesn't work

Roman 2009-11-10 11:20:09

The encoding of the source file has nothing to do with what `str()` supports. `str()` only supports unicode characters in py3k, so either use repr() or unicode() everywhere.

Aaron Digulla 2009-11-10 11:27:37

Answer 3

A:

I would like to say that I've found a solution in Unix systems, exporting a environment var, with this:

export LC_CTYPE="es:ES.UTF-8"

This way, all files are in utf-8, so I can make prints or whatever and it works fine

Roman 2009-11-10 12:12:07

What does this have to do with your question? Or with python?

Kugel 2009-11-10 18:46:39

ansaurus

tags:

views:

answers:

How to print non-ASCII characters in Python

related questions