Lately, I've had lots of trouble with __repr__(), format(), and encodings. Should the output of __repr__() be encoded or be a unicode string? Is there a best encoding for the result of __repr__() in Python? What I want to output does have non-ASCII characters.
I use Python 2.x, and want to write code that can easily be adapted to Python 3. The program thus uses
# -*- coding: utf-8 -*-
from __future__ import unicode_literals, print_function # The 'Hello' literal represents a Unicode object
Here are some problems that have been bothering me, and I'm looking for a solution that solves them:
- Printing to an UTF-8 terminal should work (I have
sys.stdout.encodingset toUTF-8, but it would be best if other cases worked too). - Piping the output to a file (encoded in UTF-8) should work (in this case,
sys.stdout.encodingisNone). - My code for many
__repr__()functions currently has manyreturn ….encode('utf-8'), and that's heavy. Is there anything robust and lighter? - In some cases, I even have ugly beasts like
return ('<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8'), i.e., the representation of objects is decoded, put into a formatting string, and then re-encoded. I would like to avoid such convoluted transformations.
What would you recommend to do in order to write simple __repr__() functions that behave nicely with respect to these encoding questions?