ansaurus

Question

Best output type and coding practices for __repr__() functions?

Answer 1

+5 A:

In Python2, __repr__ (and __str__) must return a string object, not a unicode object. In Python3, the situation is reversed, __repr__ and __str__ must return unicode objects, not byte (née string) objects:

class Foo(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}' 

class Bar(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}'.encode('utf8')

repr(Bar())
# ☺
repr(Foo())
# UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128)

In Python2, you don't really have a choice. You have to pick an encoding for the return value of __repr__.

By the way, have you read the PrintFails wiki? It may not directly answer your other questions, but I did find it helpful in illuminating why certain errors occur.

When using from __future__ import unicode_literals,

'<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8')

can be more simply written as

str('<{}>').format(repr(x))

assuming str encodes to utf-8 on your system.

Without from __future__ import unicode_literals, the expression can be written as:

'<{}>'.format(repr(x))

unutbu 2010-09-02 14:01:23

It would be nice if the documentation mentioned this :) (http://docs.python.org/reference/datamodel.html#basic-customization does not)… Anyway… you would say that the approach in point 4 in the question is cumbersome but necessary, right?

EOL 2010-09-02 14:11:54

EOL: Assuming Python2, `repr(x)` must return an encoded string. If it was encoded with utf-8, then `repr(x).decode('utf8').encode('utf8')` should not be necessary.If `repr(x)` is encoded with some other encoding, `repr(x).decode('utf8')` will either fail (with UnicodeDecodeError) or produce bogus results, or maybe decode correctly by lucky happenstance. So, AFAIK, `repr(x).decode('utf8').encode('utf8')`should never be necessary. Can you provide an example?

unutbu 2010-09-02 14:23:09

@EOL, **The return value must be a string object.** is how the reference manual page you point to expresses the constraint that the return value must be an instance of `str` (a unicode object would not be "a string object"). `repr` is _normally_ expected to return ascii only (thing of `repr(uo)` for all unicode objects, for example: even _that_ returns ascii only -- I think no built-in or standard library type behaves otherwise) but strictly speaking that is not a language constraint, so it's not the reference manual's business. Proposed docs patches are always welcome, btw!-)

Alex Martelli 2010-09-02 14:29:01

@Alex: Thank you for the comments. I guess that my confusion comes from the fact that one also says "Unicode string", in Python 2.x: that's why I was wondering whether `__repr__()` could also return a *Unicode* string… I have been thinking of submitting doc patches. :)

EOL 2010-09-02 14:50:41

@~unutbu: I should have put parentheses in the example, which differs from what you put in the comment: the decoded object is put *into a formatting string* before encoding. I updated the original question.

EOL 2010-09-02 14:51:51

@EOL, yes, I find string-related terminology ("string", "unicode string", "raw string", ...) unfortunately at risk of ambiguity in common discourse -- I _try_ to always use rigorously non-ambiguous terms such as "str instance", "unicode object", "rawstring _literal_ ", and so forth, but sometimes such rigorous terminology feels stilted in non-formal contexts. In the Language Reference, the only occurrences of the unfortunate "unicode string" are in a single paragraph in 2.4.1 (literals): s/string/object/ there and "string" becomes unambiguous *in the Language Reference* (where it matters).

Alex Martelli 2010-09-02 18:06:35

It's also possible that the Language Reference is _deliberately_ ambiguous because it's **not** supposed to be a Reference for **CPython** only, but for _all_ conforming Python implementations: in Jython and IronPython, which we're very keen to consider fully conforming implementations, **all** strings are Unicode (and it would be costly and totally against their respective platforms to make things otherwise). Maybe we do need a supplemental **CPython** implementation-specific reference, as an _addition_ to the implementation-neutral **Language** one.

Alex Martelli 2010-09-02 18:09:03

@~unutbu: since `from __future__ import unicode_literals` is in force, '<{}>' *is* a Unicode string. So, it looks again like you're confirming that what I'm doing is correct; it's good to get such a confirmation. I'll mark your question as accepted if you can remove the part that assumes that '<{}>' is a str.

EOL 2010-09-03 11:47:33

@EOL: Ah, I forgot about `unicode_literals`. Yes, I agree with you then. If you didn't have `unicode_literals` turned on, however, you could write `'<{}>'.format(repr(x))` instead of `'<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8')`. Are you sure that `from __future__ import unicode_literals` is worth it?

unutbu 2010-09-03 12:03:35

Of course, `str('<{}>').format(repr(x))` would also work...See http://stackoverflow.com/questions/809796/any-gotchas-using-unicode-literals-in-python-2-6

unutbu 2010-09-03 12:04:51

@~unutbu: Unicode with Python 2.x *is* tricky: `'<{}>'.format(repr(x))` does *not* work when you have bytes with value > 127 in the representation (because the literal creates a Unicode object)! Thank you for the `str(…).format()` suggestion. As for the `from __future__`, I like the fact that string literals are Unicode objects, because these objects correspond to Python 3's strings (one of the goals is to prepare the transition to Python 3).

EOL 2010-09-03 12:28:39

@EOL: I'm not sure that `from __future__ import unicode_literals` is helping you prepare for Python3. Think about what your code should look like in Python3. It would just be `'<{}>'.format(repr(x))`. Anything you write that deviates from that, even `str('<{}>').format(repr(x))`, is just cruft that will have to be fixed during the transition. Are you sure that `'<{}>'.format(repr(x))` does not work if you turn off `unicode_literals`?

unutbu 2010-09-03 12:52:28

@~unutbu: good point, about the simpler code when not using `unicode_literals`. I'll turn it off (in which case the simpler code does indeed work). If you can remove the part with "may be incorrect" (which refers to a different situation than that of the question, which assumed Unicode litterals), I'll mark your answer as accepted.

EOL 2010-09-04 09:20:38

@EOL: agreed. Best of luck with your work.

unutbu 2010-09-04 13:03:28

ansaurus

tags:

views:

answers:

Best output type and coding practices for repr() functions?

related questions