views:

292

answers:

3

What could generate the following behavior ?

>>> print str(msg)
my message
>>> print unicode(msg)
my message

But:

>>> print '%s' % msg
another message

More info:

  • my msg object is inherited from unicode.
  • the methods __str__/__unicode__/__repr__ methods were overridden to return the string 'my message'.
  • the msg object was initialised with the string 'another message'.
  • this is running on python 2.5
  • the variable msg was not changed between the tests
  • this is actually real doctest that is really giving these results.

I would like an solution that matches this doctest, with minimal fuss (especially around the actual inheritance):

>>> print '%s' % msg
my message

Thanks for all suggestions.

I don't feel this will help more, but for curious readers (and adventurous pythonist), here's the implementation of the object:

class Message(zope.i18nmessageid.Message):

    def __repr__(self):
        return repr(zope.i18n.interpolate(self.default, self.mapping))

    def __str__(self):
        return zope.i18n.interpolate(self.default, self.mapping)

    def __unicode__(self):
        return zope.i18n.interpolate(self.default, self.mapping)

This is how we create the object msg:

>>> msg = Message('another message', 'mydomain', default='my message')

Zope packages version and code used are:

EDIT INFO:

  • added/updated the names of the methods that were overriden
  • added some more info (python version, and minor info)
  • updated some wrong info (the class of `msg` is based on `unicode` class and not `basestring`)
  • added the actual implementation of the class used
+5  A: 

Update 2: Please find the original answer, including a simple example of a class exhibiting the behaviour described by the OP, below the horizontal bar. As for what I was able to surmise in the course of my inquiry into Python's sources (v. 2.6.4):

The file Include/unicodeobject.h contains the following to lines (nos. 436-7 in my (somewhat old) checkout):

#define PyUnicode_AS_UNICODE(op) \                                              
        (((PyUnicodeObject *)(op))->str)

This is used all over the place in the formatting code, which, as far as I can tell, means that during string formatting, any object which inherits from unicode will be reached into so that its unicode string buffer may be used directly, without calling any Python methods. Which is good as far as performance is concerned, I'm sure (and very much in line with Juergen's conjecture in a comment on this answer).

For the OP's question, this probably means that making things work the way the OP would like them to may only be possible if something like Anurag Uniyal's wrapper class idea is acceptable for this particular use case. If it isn't, the only thing which comes to my mind now is to wrap objects of this class in str / unicode wherever their being interpolated into a string... ugh. (I sincerely hope I'm just missing a cleaner solution which someone will point out in a minute!)


(Update: This was posted about a minute before the OP included the code of his class, but I'm leaving it here anyway (1) for the conjecture / initial attempt at an explanation below the code, (2) for a simple example of how to produce this behaviour (Anurag Uniyal has since provided another one calling unicode's constructor directly, as opposed to via super), (3) in hope of later being able to edit in something to help the OP in obtaining the desired behaviour.)

Here's an example of a class which actually works like what the OP describes (Python 2.6.4, it does produce a deprecation warning -- /usr/bin/ipython:3: DeprecationWarning: object.__init__() takes no parameters):

class Foo(unicode):
    def __init__(self, msg):
        super(unicode, self).__init__(msg)
    def __str__(self): return 'str msg'
    def __repr__(self): return 'repr msg'
    def __unicode__(self): return u'unicode msg'

A couple of interactions in IPython:

In [12]: print(Foo("asdf"))
asdf

In [13]: str(Foo("asdf"))
Out[13]: 'str msg'

In [14]: print str(Foo("asdf"))
-------> print(str(Foo("asdf")))
str msg

In [15]: print(str(Foo("asdf")))
str msg

In [16]: print('%s' % Foo("asdf"))
asdf

Apparently string interpolation treats this object as an instance of unicode (directly calling the unicode implementation of __str__), whereas the other functions treat it as an instance of Foo. How this happens internally and why it works like this and whether it's a bug or a feature, I really don't know.

As for how to fix the OP's object... Well, how would I know without seeing its code??? Give me the code and I promise to think about it! Ok, I'm thinking about it... No ideas so far.

Michał Marczyk
Looks to me, as print has made some shortcut -- to speed up things, I would think. Python has (relatively fast) internal interfaces and (relatively slow) external interfaces. I guess, that somebody tried to avoid the overhead ...
Juergen
@Juergen: Included some info on what the sources look like in the answer now... It certainly seems that you're right.
Michał Marczyk
@Michal: Thanks for the info! Python is rather clean as a system, but (as much I understand it and also have seen a little bit of) sometimes some shortcuts are made internally where a big speed advantage can be gained. This is OK in my opinion, since those shortcuts are not visible in 99% of all cases ... in the other 1%, a workaround must be made like in this case. Of course, when stumpling over one, it can be rather surprising or even annoying ...
Juergen
@Juergen: Agreed. Perhaps for cases like this some flag (visible from C) could be set on the object to indicate that it overrides a builtin's methods and thus needs to be treated in the slow way... I'm hardly competent to judge though. I guess if vaab goes on to make the bug report, then we'll find out what the Python team's opinion is.
Michał Marczyk
+4  A: 

So problem is class like to something below behaves weirdly

class Msg(unicode):
    def __init__(self, s):
        unicode.__init__(self, s)

    __unicode__ = __repr__ = __str__ = lambda self: "my message"

msg = Msg("another message")
print str(msg)
print unicode(msg)
print "%s"%msg

this prints

my message
my message
another message

I am not sure why this happens or how to fix it, but a very crude attempt by wrapping Msg, but not sure it will help in OP's problem

class MsgX(object):
    def __init__(self, s):
        self._msg = Msg(s)

    __unicode__ = __repr__ = __str__ = lambda self: repr(self._msg)

msg = MsgX("another message")
print str(msg)
print unicode(msg)
print "%s"%msg

output:

my message
my message
my message
Anurag Uniyal
I cannot afford to change the inheritage towards unicode. However, thanks for you simplified example.
vaab
@vaab: if you look at the extended answer I gave, the addition of `__getattr__` will forward all accessors that *would* have been resolved by inheritance to the contained .msg attribute. This is a very powerful idiom in Python, and puts wrap-and-delegate on par with inheritance, with less coupling.
Paul McGuire
+3  A: 
Paul McGuire
Great Idea, but this does not work ! ;) Well it works for the given doctest, but the fact this class is not anymore an instance of `string` break some other C checks in Python common libraries that I use and need to use. I'll be more clear tomorrow.
vaab
Ah, you (or those libs) are using isinstance, perhaps? And now this class no longer inherits from basestring? Hmmm, those isinstance checks wouldn't happen to be doing parameter validation would they? This is an excellent case showing why isinstance parameter checking is not always the best idea in Python.
Paul McGuire