tags:

views:

2298

answers:

4

When I try to print a Unicode string in a windows console, I get a "UnicodeEncodeError: 'charmap' codec can't encode character ...." error. I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this? Is there any way I can make Python automatically print a "?" instead of failing in this situation?

Edit: I'm using Python 2.5.

A: 

What version of Python are you on? I've seen references that this was broken in 2.4.3 and fixed in 2.4.4.

Stu
+5  A: 

Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):

PrintFails - Python Wiki

Lasse V. Karlsen
+2  A: 

The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:

import sys
sys.getdefaultencoding()

if it says ascii, there's your cause ;-) You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win - it is c:\python\lib\site-packages or something), with the following contents:

import sys
sys.setdefaultencoding('utf-8')

and perhaps you might want to specify the encoding in your files as well:

# -*- coding: UTF-8 -*-
import sys,time

Edit: more info can be found in excellent the Dive into Python book

Bartosz Radaczyński
setdefaultencoding() is nolonger in sys (as of v2.0 according to the module docs).
Jon Cage
hmmmm, strange... will look into it.
Bartosz Radaczyński
I cannot prove it right now, but I know that I've used this trick on a later version - 2.5 on Windows.
Bartosz Radaczyński
OK, after quite a while I have found out that: "This function is only intended to be used by the site module implementation and, where needed, by sitecustomize. Once used by the site module, it is removed from the sys module’s namespace."
Bartosz Radaczyński
Any encoding Python outputs must be matched by the console that going to be printing it. Setting the output encoding to UTF-8 is all very well, but if the console's not UTF-8 (and on Windows it won't ever be), you'll get garbage instead of non-ASCII characters.
bobince
actually you can set the windows console to be utf-8. you need to say chcp 65001 and it will be unicode.
Bartosz Radaczyński
+4  A: 

The below code will make Python output to console as UTF-8 even on Windows.

The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.

Below code was tested with Python 2.6 on Windows.


#!/usr/bin/python
# -*- coding: UTF-8 -*-

import codecs, sys, win32console

reload(sys)
sys.setdefaultencoding('utf-8')

print sys.getdefaultencoding()

if sys.platform == 'win32':
    try:
        import win32console 
    except:
        print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
        exit(-1)
    # win32console implementation  of SetConsoleCP does not return a value
    # CP_UTF8 = 65001
    win32console.SetConsoleCP(65001)
    if (win32console.GetConsoleCP() != 65001):
        raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
    win32console.SetConsoleOutputCP(65001)
    if (win32console.GetConsoleOutputCP() != 65001):
        raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")

#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"
Sorin Sbarnea