Hi, I'm struggling with print and unicode conversion. Here is some code executed in the 2.5 windows interpreter.
>>> import sys
>>> print sys.stdout.encoding
cp850
>>> print u"é"
é
>>> print u"é".encode("cp850")
é
>>> print u"é".encode("utf8")
├®
>>> print u"é".__repr__()
u'\xe9'
>>> class A():
... def __unicode__(self):
... return u"é"
...
>>> print A()
<__main__.A instance at 0x0000000002AEEA88>
>>> class B():
... def __repr__(self):
... return u"é".encode("cp850")
...
>>> print B()
é
>>> class C():
... def __repr__(self):
... return u"é".encode("utf8")
...
>>> print C()
├®
>>> class D():
... def __str__(self):
... return u"é"
...
>>> print D()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
>>> class E():
... def __repr__(self):
... return u"é"
...
>>> print E()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
So, when a unicode string is printed, it's not it's __repr__()
function which is called and printed.
But when an object is printed __str__()
or __repr__()
(if __str__
not implemented) is called, not __unicode__()
. Both can not return a unicode string.
But why? Why if __repr__()
or __str__()
return a unicode string, shouldn't it be the same behavior than when we print a unicode string? I other words: why print D()
is different from print D().__str__()
Am I missing something?
These samples also show that if you want to print an object represented with unicode strings, you have to encode it to a object string (type str). But for nice printing (avoid the "├®"), it's dependent of the sys.stdout
encoding.
So, do I have to add u"é".encode(sys.stdout.encoding)
for each of my __str__
or __repr__
method? Or return repr(u"é")?
What if I use piping? Is is the same encoding than sys.stdout
?
My main issue is to make a class "printable", i.e. print A()
prints something fully readable (not with the \x*** unicode characters).
Here is the bad behavior/code that needs to be modified:
class User(object):
name = u"Luiz Inácio Lula da Silva"
def __repr__(self):
# returns unicode
return "<User: %s>" % self.name
# won't display gracefully
# expl: print repr(u'é') -> u'\xe9'
return repr("<User: %s>" % self.name)
# won't display gracefully
# expl: print u"é".encode("utf8") -> print '\xc3\xa9' -> ├®
return ("<User: %s>" % self.name).encode("utf8")
Thanks!