ansaurus

Question

Python: UnicodeEncodeError when reading from stdin

Answer 1

+2 A:

The problem is, that when reading from stdin, python decodes it using the system default encoding:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

The input is very likely UTF-8 or Windows-CP-1252, so the program chokes on non-ASCII-characters.

To convert sys.stdin to a stream with the proper decoder, I used:

import codecs
char_stream = codecs.getreader("utf-8")(sys.stdin)

That fixed the problem.

BTW, this is the method ANTLRs FileStream uses to open a file with given filename (instead of a given stream):

    fp = codecs.open(fileName, 'rb', encoding)
    try:
        data = fp.read()
    finally:
        fp.close()

BTW #2: For strings I found

a_string.encode(encoding)

useful.

hansfbaier 2010-03-18 06:41:02

Answer 2

+1 A:

You're not getting this error on input, you're getting this error when trying to output the read data. You should be decoding data you read, and throwing the unicodes around instead of dealing with bytestrings the whole time.

Ignacio Vazquez-Abrams 2010-03-18 06:41:10

Yes, but I am dealing with foreign code here

hansfbaier 2010-03-20 02:23:35

ansaurus

tags:

views:

answers:

Python: UnicodeEncodeError when reading from stdin

related questions