ansaurus

Question

Windows cmd encoding change causes Python crash.

Answer 1

A:

This is because "code page" of cmd is different to "mbcs" of system. Although you changed the "code page", python (actually, windows) still think your "mbcs" doesn't change.

kcwu 2009-05-18 18:12:57

Answer 2

A:

Do you want Python to encode to UTF-8?

>>>print u'ëèæîð'.encode('utf-8')
Ã«Ã¨Ã¦Ã®Ã°

Python will not recognize cp65001 as UTF-8.

jcoon 2009-05-18 18:21:09

Answer 3

+1 A:

A few comments: you probably misspelled encodig and .code. Here is my run of your example.

C:\>chcp 65001
Active code page: 65001

C:\>\python25\python
...
>>> import sys
>>> sys.stdin.encoding
'cp65001'
>>> s=u'\u0065\u0066'
>>> s
u'ef'
>>> s.encode(sys.stdin.encoding)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: cp65001
>>>

The conclusion - cp65001 is not a known encoding for python. Try 'UTF-16' or something similar.

gimel 2009-05-18 18:29:59

Yes, I definitely misspelled it, but I tried it the right way and the same crash (this actually proves that the interpreter didn't actually get to evaluate the misspelled 'encode()' and 'encoding()' attributes and crashed while processing 'ëèæîð'. I fixed the typo.

Alex 2009-05-18 18:39:41

Answer 4

A:

I had this annoying issue, too, and I hated not being able to run my unicode-aware scripts same in MS Windows as in linux. So, I managed to come up with a workaround.

Take this script (say, uniconsole.py in your site-packages or whatever):

import sys, os

if sys.platform == "win32":
    class UniStream(object):
        __slots__= "fileno", "softspace",
        def __init__(self, fileobject):
            self.fileno= fileobject.fileno()
            self.softspace= False
        def write(self, text):
            if isinstance(text, unicode):
                os.write(self.fileno, text.encode("utf_8"))
            else:
                os.write(self.fileno, text)
    sys.stdout= UniStream(sys.stdout)
    sys.stderr= UniStream(sys.stderr)

This seems to work around the python bug (or win32 unicode console bug, whatever). Then I added in all related scripts:

try: import uniconsole
except ImportError: sys.exc_clear() # could be just pass, of course
else: del uniconsole # reduce pollution, not needed anymore

Finally, I just run my scripts as needed in a console where chcp 65001 is run and the font is Lucida Console. (How I wish that DejaVu Sans Mono could be used instead… but hacking the registry and selecting it as a console font reverts to a bitmap font.)

This is a quick-and-dirty stdout and stderr replacement, and also does not handle any raw_input related bugs (obviously, since it doesn't touch sys.stdin at all). And, by the way, I've added the cp65001 alias for utf_8 in the encodings\aliases.py file of the standard lib.

ΤΖΩΤΖΙΟΥ 2009-09-16 11:42:40

Answer 5

+1 A:

David-Sarah Hopwood 2010-07-15 19:35:10

ansaurus

tags:

views:

answers:

Windows cmd encoding change causes Python crash.

related questions