Since this question involves a lot of confusing unicode stuff, I thought I'd offer an analysis of what was going on here.
It all comes down to the implementation of __unicode__
and __repr__
of the builtin list
class. Basically, it is equivalent to:
class list(object):
def __repr__(self):
return "[%s]" % ", ".join(repr(e) for e in self.elements)
def __str__(self):
return repr(self)
def __unicode__(self):
return str(self).decode()
Actually, list
doesn't even define the __unicode__
and __str__
methods, which makes sense when you think about it.
When you write:
u"%s" % [a] # it expands to
u"%s" % unicode([a]) # which expands to
u"%s" % repr([a]).decode() # which expands to
u"%s" % ("[%s]" % repr(a)).decode() # (simplified a little bit)
u"%s" % ("[%s]" % unicode(a).encode('utf-8')).decode()
That last line is an expansion of repr(a), using the implementation of __repr__
in the question.
So as you can see, the object is first encoded in utf-8, only to be decoded later with the system default encoding, which usually doesn't support all characters.
As some of the other answers mentioned, you can write your own function, or even subclass list, like so:
class mylist(list):
def __unicode__(self):
return u"[%s]" % u", ".join(map(unicode, self))
Note that this format is not round-trippable. It can even be misleading:
>>> unicode(mylist([]))
u'[]'
>>> unicode(mylist(['']))
u'[]'
Of cource, you can write a quote_unicode
function to make it round-trippable, but this is the moment to ask youself what's the point. The unicode
and str
functions are meant to create a representation of an object that makes sense to a user. For programmers, there's the repr
function. Raw lists are not something a user is ever supposed to see. That's why the list
class does not implement the __unicode__
method.
To get a somewhat better idea about what happens when, play with this little class:
class B(object):
def __unicode__(self):
return u"unicode"
def __repr__(self):
return "repr"
def __str__(self):
return "str"
>>> b
repr
>>> [b]
[repr]
>>> unicode(b)
u'unicode'
>>> unicode([b])
u'[repr]'
>>> print b
str
>>> print [b]
[repr]
>>> print unicode(b)
unicode
>>> print unicode([b])
[repr]