views:

182

answers:

3

It's a trouble when Python raised a WindowsError, the encoding of message of the exception is always os-native-encoded. For example:

import os
os.remove('does_not_exist.file')

Well, here we get an exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
WindowsError: [Error 2] 系統找不到指定的檔案。: 'does_not_exist.file'

As the language of my Windows7 is Traditional Chinese, the default error message I get is in big5 encoding (as know as CP950).

>>> try:
...     os.remove('abc.file')
... except WindowsError, value:
...     print value.args
...
(2, '\xa8t\xb2\xce\xa7\xe4\xa4\xa3\xa8\xec\xab\xfc\xa9w\xaa\xba\xc0\xc9\xae\xd7\xa1C')
>>>

As you see here, error message is not Unicode, then I will get another encoding exception when I try to print it out. Here is the issue, it can be found in Python issue list: http://bugs.python.org/issue1754

The question is, how to workaround this? How to get the native encoding of WindowsError? The version of Python I use is 2.6.

Thanks.

A: 

sys.getfilesystemencoding() should help.

import os, sys
try:
    os.delete('nosuchfile.txt')
except WindowsError, ex:
    enc = sys.getfilesystemencoding()
    print (u"%s: %s" % (ex.strerror, ex.filename.decode(enc))).encode(enc)

For other purposes than printing to console you may want to change final encoding to 'utf-8'

Alex Lebedev
A: 

That is just the repr() string of the same error message. Since your console already supports cp950, just print the component you want. This works on my system after reconfiguring to use cp950 in my console. I had to explicitly raise the error message since my system is English and not Chinese:

>>> try:
...     raise WindowsError(2,'系統找不到指定的檔案。')
... except WindowsError, value:
...     print value.args
...
(2, '\xa8t\xb2\xce\xa7\xe4\xa4\xa3\xa8\xec\xab\xfc\xa9w\xaa\xba\xc0\xc9\xae\xd7\xa1C')
>>> try:
...     raise WindowsError(2,'系統找不到指定的檔案。')
... except WindowsError, value:
...     print value.args[1]
...
系統找不到指定的檔案。

Alternatively, use Python 3.X. It prints repr() using the console encoding. Here's an example:

Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> '系統找不到指定的檔案。'
'\xa8t\xb2\xce\xa7\xe4\xa4\xa3\xa8\xec\xab\xfc\xa9w\xaa\xba\xc0\xc9\xae\xd7\xa1C'

Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> '系統找不到指定的檔案。'
'系統找不到指定的檔案。'
Mark Tolonen
Actually, I get the exception when I tried to write the error message to logger, and that's why I have to deal the problem. The logger's handler might be File, console and even SMTP. Also, console might not be same encoding as the windows OS be, for example, run the program IDLE or Pydev, it seems that the encoding is utf8 rather than CP950, only when run the program with CMD of windows, it will be the locale encoding.
Victor Lin
+1  A: 

We have the same problem in Russian version of MS Windows: the code page of the default locale is cp1251, but the default code page of the Windows console is cp866:

>>> import sys
>>> print sys.stdout.encoding
cp866
>>> import locale
>>> print locale.getdefaultlocale()
('ru_RU', 'cp1251')

The solution should be to decode the Windows message with default locale encoding:

>>> try:
...     os.remove('abc.file')
... except WindowsError, err:
...     print err.args[1].decode(locale.getdefaultlocale()[1])
...

The bad news is that you still can't use exc_info=True in logging.error().

newtover