ansaurus

Question

Answer 1

A:

I'm not too sure about input encoding, but I've found that with output encoding to tty streams, an explicit encoding step was needed for Python 2.x but not for Python 3.x.

So for input you may need an explicit decode step using e.g. l.decode(sys.stdin.encoding).

Does it work OK in a vanilla Python console?

Vinay Sajip 2009-07-27 14:31:48

Answer 2

+2 A:

Well, I don't know how to fix it, but I have deduced the pattern in what goes wrong.

The bytes that get replaced with "?" are precisely those bytes that are not defined in windows-1252 - that is, bytes 0x81, 0x8d, 0x8f, 0x90, and 0x9d.

What this looks like to me is that somehow you're getting this series of translations:

unicode input -> series of bytes in utf-8
utf-8 bytes -> read by something that expects the input to be Windows-1252, and so translates impossible bytes to "?"
the characters in converted back to bytes via windows-1252, and fed into your variable l.

Does this version of pydev give sys.stdin.encoding a decent value? And how does sys.stdin.encoding compare to the result of sys.getdefaultencoding()?

Daniel Martin 2009-07-27 17:15:26

Very plausable explaination, thankssys.stdin.encoding == sys.getdefaultencoding() == 'utf-8'

ymv 2009-07-27 18:50:18

ansaurus

tags:

views:

answers:

Bizzare eclipse-pydev console behavior

related questions