print
ing of Unicode strings relies on sys.stdout
(the process's standard output) having a correct .encoding
attribute that Python can use to encode the unicode string into a byte string to perform the required printing -- and that setting depends on the way the OS is set up, where standard output is directed to, and so forth.
If there's no such attribute, the default coded ascii
is used, and, as you've seen, it often does not provide the desired results;-).
You can check getattr(sys.stdout, 'encoding', None)
to see if the encoding is there (if it is, you can just keep your fingers crossed that it's correct... or, maybe, try some heavily platform-specific trick to guess at the correct system encoding to check;-). If it isn't, in general, there's no reliable or cross-platform way to guess what it could be. You could try 'utf8'
, the universal encoding that works in a lot of cases (surely more than ascii
does;-), but it's really a spin of the roulette wheel.
For more reliability, your program should have its own configuration file to tell it what output encoding to use (maybe with 'utf8'
just as the default if not otherwise specified).
It's also better, for portability, to perform your own encoding, that is, not
print someunicode
but rather
print someunicode.encode(thecodec)
and actually, if you'd rather have incomplete output than a crash,
print someunicode.encode(thecodec, 'ignore')
(which simply skips non-encodable characters), or, usually better,
print someunicode.encode(thecodec, 'replace')
(which uses question-mark placeholders for non-encodable characters).