views:

1122

answers:

3

Assume for a moment that one cannot use print (and thus enjoy the benefit of automatic encoding detection). So that leaves us with sys.stdout. However, sys.stdout is so dumb as to not do any sensible encoding.

Now one reads the Python wiki page PrintFails and goes to try out the following code:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout);

However this too does not work (at least on Mac). Too see why:

>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> sys.stdout.encoding
'UTF-8'

(UTF-8 is what my terminal understands).

So one changes the above code to:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout);

And now unicode strings are properly sent to sys.stdout and hence printed properly on the terminal (sys.stdout is attached the terminal).

Is this the correct way to write unicode strings in sys.stdout or should I be doing something else?

EDIT: sometimes sys.stdout.encoding will be None (example: when piping the out through less). in this case, the above code will fail.

+2  A: 

It's not clear to my why you wouldn't be able to do print; but assuming so, yes, the approach looks right to me.

Martin v. Löwis
One reason I cannot use `print` is to avoid that extra space `print` prints. Look at the use of `sys.stdout` here: http://stackoverflow.com/questions/1396820/apt-like-column-output-python-library/1397382#1397382
Sridhar Ratnakumar
You could build up complete lines, and then print them.
Martin v. Löwis
Bravo! Yes, in that case I can use `print`
Sridhar Ratnakumar
adding a comma to the end makes print suppress the newline: print "Some Text",
Georg
adding a comma will not print a newline, but it will print an extra space. try running: python -c "print 2,; print 3,"
Sridhar Ratnakumar
Martin, even using `print` did not help when piping the output to `less`. logging.StreamHandler works fine though.
Sridhar Ratnakumar
If the output is to a pipe, it can't possibly know what encoding to use (as it can't know that less(1) is at the other end of the pipe). So your application will have to determine/decide the encoding for itself.
Martin v. Löwis
In Python 3 you can do `print(stuff, sep='', end='')` to avoid extra spaces. And I suspect the encoding problem isn't present there either.
ilya n.
+2  A: 

Best idea is to check if you are directly connected to a terminal. If you are, use the terminal's encoding. Otherwise, use system preferred encoding.

if sys.stdout.isatty():
    default_encoding = sys.stdout.encoding
else:
    default_encoding = locale.getpreferredencoding()

It's also very important to always allow the user specify whichever encoding she wants. Usually I make it a command-line option (like -e ENCODING), and parse it with the optparse module.

Another good thing is to not overwrite sys.stdout with an automatic encoder. Create your encoder and use it, but leave sys.stdout alone. You could import 3rd party libraries that write encoded bytestrings directly to sys.stdout.

nosklo
+1  A: 

There is an optional environment variable "PYTHONIOENCODING" which may be set to a desired default encoding. It would be one way of grabbing the user-desired encoding in a way consistent with all of Python. It is buried in the Python manual here.

daveagp