ansaurus

Question

Setting the correct encoding when piping stdout in python

Answer 1

+7 A:

Your code works when run in an script because python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.

A rule of thumb is: Always use unicode internally. decode what you receive, encode what you send.

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

Another didactic example is a python program to convert between iso8859-1 and utf-8, making everything uppercase in between.

import sys
for line in sys.stdin:
    # decode what you receive:
    line = line.decode('iso8859-1')

    # work with unicode internally:
    line = line.upper()

    # encode what you send:
    line = line.encode('utf-8')
    sys.stdout.write(line)

Setting system default encoding is a bad idea because some modules and libraries you use can rely on the fact it is ascii. Don't do it.

nosklo 2009-01-29 18:03:18

The problem is that the user doesn't want to specify encoding explicitly. He wants just use Unicode for IO. And the encoding he uses should be an encoding specified in locale settings, not in terminal application settings. AFAIK, Python 3 uses a *locale* encoding in this case. Changing `sys.stdout` seems like a more pleasant way.

Andrey Vlasovskikh 2010-04-02 22:01:34

Encoding / decoding every string excplictly is bound to cause bugs when a encode or decode call is missing or added once to much somewhere. The output encoding can be set when output is a terminal, so it can be set when output is not a terminal. There is even a standard LC_CTYPE environment to specify it. It is a but in python that it doesn't respect this.

Rasmus Kaj 2010-05-31 15:34:30

@Rasmus Kaj: If you consistently use a defined function for output you can be sure that it won't be missing or duplicated. Output encoding can't be "set". Accepting only unicode on `sys.stdout` (by replacing it with `codecs.getwriter`) breaks a lot of libraries in practice.

nosklo 2010-05-31 20:48:58

Answer 2

+12 A:

First, regarding this solution:

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

It's not practical to explicitly print with a given encoding every time. That would be repetitive and error-prone.

A better solution is to change sys.stdout at the start of your program, to encode with a selected encoding. Here is one solution I found on Python: How is sys.stdout.encoding chosen?, in particular a comment by "toka":

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)

Craig McQueen 2009-07-23 02:05:58

unfortunately, changing sys.stdout to accept only unicode breaks a lot of libraries that expect it to accept encoded bytestrings.

nosklo 2009-12-04 19:14:18

nosklo: Then how can it work reliably and automaticly when output is a terminal?

Rasmus Kaj 2010-05-31 15:36:19

@Rasmus Kaj: just define your own unicode printing function and use it every time you want to print unicode: `def myprint(unicodeobj): print unicodeobj.encode('utf-8')` -- you automatically detect terminal encoding by inspecting `sys.stdout.encoding`, but you should consider the case where it is `None` (i.e. when redirecting output to a file) so you need a separate function anyway.

nosklo 2010-05-31 20:46:22

Answer 3

+1 A:

You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8." I have written a page on my ordeal with this problem.

daveagp 2010-10-26 20:30:35

ansaurus

tags:

views:

answers:

Setting the correct encoding when piping stdout in python

related questions