views:

67

answers:

2

I'm using Python 2.6 on Windows 7

I borrowed some code from here: http://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console

My goal is to be able to display uft-8 strings in the windows console.

Apparantly in python 2.6, the

sys.setdefaultencoding()

is no longer supported

However, I wrote reload(sys) before I tried to use it and it magically didn't error.

This code will NOT error, but it shows funny characters instead of japanese text. I believe the problem is because I have not successfully changed the codepage of the windows console.

These are my attempts, but they don't work:

reload(sys)
sys.setdefaultencoding('utf-8')

print os.popen('chcp 65001').read()

sys.stdout.encoding = 'cp65001'

Perhaps you can use win32console to change the codepage? I tried the code from the website I linked, but it also errored from the win32console.. maybe that code is obsolete.

Here's my code, that doesn't error but prints funny characters:

#coding=<utf8>
import os
import sys
import codecs



reload(sys)
sys.setdefaultencoding('utf-8')
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

#print os.popen('chcp 65001').read()
print(sys.stdout.encoding)
sys.stdout.encoding = 'cp65001'
print(sys.stdout.encoding)

x = raw_input('press enter to continue')

a = 'こんにちは世界'#.decode('utf8')
print a

x = raw_input()
+2  A: 

Never ever ever use setdefaultencoding. If you want to write unicode strings to stdio, encode them explicitly. Monkeying around with setdefaultencoding will cause stdlib modules and third-party modules alike to break in horrible subtle ways by allowing implicit conversion between str and unicode when it shouldn't happen.

Yes, the problem is most likely that your code page isn't set properly. However, using os.popen won't change the code page; it'll spawn a new shell, change its code page, and then immediately exit without affecting your console at all. I'm not personally very familiar with windows, so I couldn't tell you how to change your console's code page from within your python program.

The way to properly display unicode data via utf-8 from python, as mentioned before, is to explicitly encode your strings before printing them: print s.encode('utf-8')

Aaron Gallagher
Regarding "Never ever ever use setdefaultencoding." I do not think your reasoning for this is valid - it is insufficient at best. In fact, it is OK to set it to 'utf-8' as ascii is only a subset of it. If by setting it any problem arises in a module, it is the bug of the module. If you oppose, could you show us counterexamples?
OTZ
@otz, the stdlib and many, many third-party libraries assume ASCII is the default python encoding. There's a good discussion of why setting the default encoding is silly here: http://faassen.n--tree.net/blog/view/weblog/2005/08/02/0
Aaron Gallagher
@otz, some other things not covered by that article: mixing text (unicode strings) and bytes is a nonsense operation anyway. If the bytes represent text, they should be decoded to unicode anyway. Increasing the likelihood that a meaningless operation will accidentally succeed without any warning is not exactly the best thing if you want to write sane code. As I already said, a lot of existing python code relies on ASCII being the default; if implicit encodings were turned off, the code would break.
Aaron Gallagher
A: 

Windows doesn't support UTF-8 in a console properly. The only way I know of to display Japanese in the console is by changing (on XP) Control Panel's Regional and Language Options, Advanced Tab, Language for non-Unicode Programs to Japanese. After rebooting, open a console and run "chcp" to find out the Japanese console's code page. Then either print Unicode strings or byte strings explicitly encoded in the correct code page.

Mark Tolonen