ansaurus

Question

Answer 1

+2 A:

Same as unicode(str(1)).

>>> class thing(object):
...     def __str__(self):
...         print "__str__ called on " + repr(self)
...         return repr(self)
...
>>> a = thing()
>>> a
<__main__.thing object at 0x7f2f972795d0>
>>> unicode(a)
__str__ called on <__main__.thing object at 0x7f2f972795d0>
u'<__main__.thing object at 0x7f2f972795d0>'

If you really want to see the gritty bits underneath, open up the Python interpreter source code.

Objects/unicodeobject.c#PyUnicode_Type defines the unicode type, with constructor .tp_new=unicode_new.

Since the optional arguments encoding or errors are not given, and a unicode object is being constructed (as opposed to a unicode subclass), Objects/unicodeobject.c#unicode_new calls PyObject_Unicode.

Objects/object.c#PyObject_Unicode calls the __unicode__ method if it exists. If not, it falls back to PY_Type(v)->tp_str (a.k.a. __str__) or PY_Type(v)->tp_repr (a.k.a. __repr__). It then passes the result to PyUnicode_FromEncodedObject.

Objects/unicodeobject.c#PyUnicode_FromEncodedObject finds that it was given a string, and passes it on to PyUnicode_Decode, which returns a unicode object.

Finally, PyObject_Unicode returns to unicode_new, which returns this unicode object.

In short, unicode() will automatically stringify your object if it needs to. This is Python working as expected.

ephemient 2010-02-03 02:09:04

I mean... what happens internally.

Juanjo Conti 2010-02-03 02:12:27

Answer 2

A:

If __unicode__ exists it is called, otherwise it falls back to __str__

class A(int):
    def __str__(self):
        print "A.str"
        return int.__str__(self)

    def __unicode__(self):
        print "A.unicode"
        return int.__str__(self)

class B(int):
    def __str__(self):
        print "B.str"
        return int.__str__(self)


unicode(A(1)) # prints "A.unicode"
unicode(B(1)) # prints "B.str"

gnibbler 2010-02-03 02:35:41

Answer 3

A:

If there is no __unicode__ method, the __str__ method will be called instead. Regardless of which of these methods is called, if a unicode is returned, it will be passed on as-is. If a str is returned, it will be decoded using the default encoding, as returned by sys.getdefaultencoding(), which should almost always be 'ascii'. If some other kind of object is returned, a TypeError will be raised.

(It is possible, by reloading the sys module, to change the default encoding by calling sys.setdefaultencoding(); this is basically always a bad idea.)

mithrandi 2010-02-03 02:36:10

ansaurus

tags:

views:

answers:

unicode class in Python

related questions