ansaurus

Question

Does __str__() call decode() method behind scenes?

Answer 1

+5 A:

Basically, __str__ can only output ascii strings. Since t2 contains unicode codepoints above ascii, it cannot be represented with just a string. __repr__, on the other hand, tries to output the python code needed to recreate the object. You'll see that the output from repr(t2) (this syntax is preferred to t2.__repr_()) is exactly what you set t2 equal to up on the first line. The result from repr looks roughly like ['\', 'u', '0', ...], which are all ascii values, but the output from str is trying to be [chr(0x0131), chr(0x015f), chr(0x0131), 'k'], most of which are above the range of characters acceptable in a python string. Generally, when dealing with django applications, you should use __unicode__ for everything, and never touch __str__.

More info in the django documentation on strings.

Michael Fairley 2009-08-12 18:20:16

Answer 2

+4 A:

In general, calling str.__unicode__() or unicode.__str__() is a very bad idea, because bytes can't be safely converted to Unicode character points and vice versa. The exception is ASCII values, which are generally the same in all single-byte encodings. The problem is that you're using the wrong method for conversion.

To convert unicode to str, you should use encode():

>>> t1 = u"\u0131\u015f\u0131k"
>>> t1.encode("utf-8")
'\xc4\xb1\xc5\x9f\xc4\xb1k'

To convert str to unicode, use decode():

>>> t2 = '\xc4\xb1\xc5\x9f\xc4\xb1k'
>>> t2.decode("utf-8")
u'\u0131\u015f\u0131k'

John Millikin 2009-08-12 18:34:12

Answer 3

A:

To add a bit of support to John's good answer:

To understand the naming of the two methods encode() and decode(), you just have to see that Python considers unicode strings of the form u'...' to be in the reference format. You encode going from the referenc format into another format (e.g. utf-8), and you decode from some other format to come to the reference format. The unicode format is always considered the "real thing" :-).

ThomasH 2009-08-12 19:04:37

Answer 4

A:

Note that in Python 3, unicode is the default, and __str__() should always give you unicode.

A. L. Flanagan 2009-08-12 20:39:34

ansaurus

tags:

views:

answers:

Does str() call decode() method behind scenes?

related questions