EDIT: (Major changes between this edit and the previous one... Note: I'm using Python 2.6.4 on an Ubuntu box.)
Firstly, in my first attempt at an answer, I provided some general information on print
and str
which I'm going to leave below for the benefit of anybody having simpler issues with print
and chancing upon this question. As for a new attempt at dealing with the issue experienced by the OP... Basically, I'm inclined to say that there's no silver bullet here and if print
somehow manages to make sense of a weird string literal, then that's not reproducible behaviour. I'm led to this conclusion by the following funny interaction with Python in my terminal window:
>>> print '\xaa\xbb\xcc'
��
Have you tried to input ª»Ì directly from the terminal? At a Linux terminal using utf-8 as the encoding, this is actually read in as six bytes, which can then be made to look like three unicode chars with the help of the decode
method:
>>> 'ª»Ì'
'\xc2\xaa\xc2\xbb\xc3\x8c'
>>> 'ª»Ì'.decode(sys.stdin.encoding)
u'\xaa\xbb\xcc'
So, the '\xaa\xbb\xcc'
literal only makes sense if you decode it as a latin-1 literal (well, actually you could use a different encoding which agrees with latin-1 on the relevant characters). As for print
'just working' in your case, it certainly doesn't for me -- as mentioned above.
This is explained by the fact that when you use a string literal not prefixed with a u
-- i.e. "asdf"
rather than u"asdf"
-- the resulting string will use some non-unicode encoding. No; as a matter of fact, the string object itself is going to be encoding-unaware, and you're going to have to treat it as if it was encoded with encoding x, for the correct value of x. This basic idea leads me to the following:
a = '\xAA\xBB\xCC'
a.decode('latin1')
# result: u'\xAA\xBB\xCC'
print(a.decode('latin1'))
# output: ª»Ì
Note the lack of decoding errors and the proper output (which I expect to be stay proper at any other box). Apparently your string literal can be made sense of by Python, but not without some help.
Does this help? (At least in understanding how things work, if not in making the handling of encodings any easier...)
Now for some funny bits with some explanatory value (hopefully)! This works fine for me:
sys.stdout.write("\xAA\xBB\xCC".decode('latin1').encode(sys.stdout.encoding))
Skipping either the decode or the encode part results in a unicode-related exception. Theoretically speaking, this makes sense, as the first decode is needed to decide what characters there are in the given string (the only thing obvious on first sight is what bytes there are -- the Python 3 idea of having (unicode) strings for characters and bytes for, well, bytes, suddenly seems superbly reasonable), while the encode is needed so that the output respects the encoding of the output stream. Now this
sys.stdout.write("ąöî\n".decode(sys.stdin.encoding).encode(sys.stdout.encoding))
also works as expected, but the characters are actually coming from the keyboard and so are actually encoded with the stdin encoding... Also,
ord('ą'.decode('utf-8').encode('latin2'))
returns the correct 177 (my input encoding is utf-8), but '\xc4\x85'.encode('latin2') makes no sense to Python, as it has no clue as to how to make sense of '\xc4\x85' and figures that trying the 'ascii' code is the best it can do.
The original answer:
The relevant bit of Python docs (for version 2.6.4) says that print(obj)
is meant to print out the string given by str(obj)
. I suppose you could then wrap it in a call to unicode
(as in unicode(str(obj))
) to get a unicode string out -- or you could just use Python 3 and exchange this particular nuisance for a couple of different ones. ;-)
Incidentally, this shows that you can manipulate the result of print
ing an object just like you can manipulate the result of calling str
on an object, that is by messing with the __str__
method. Example:
class Foo(object):
def __str__(self):
return "I'm a Foo!"
print Foo()
As for the actual implementation of print
, I expect this won't be useful at all, but if you really want to know what's going on... It's in the file Python/bltinmodule.c
in the Python sources (I'm looking at version 2.6.4). Search for a line beginning with builtin_print
. It's actually entirely straightforward, no magic going on there. :-)
Hopefully this answers your question... But if you do have a more arcane problem which I'm missing entirely, do comment, I'll make a second attempt. Also, I'm assuming we're dealing with Python 2.x; otherwise I guess I wouldn't have a useful comment.