views:

234

answers:

4

I used an anonymous pipe to capture all stdout,and stderr then print into a richedit, it's ok when i use wsprintf ,but the python using multibyte char that really annoy me. how can I convert all these output to unicode?

UPDATE 2010-01-03:

thanx for the reply. but, it seems the str.encode() only worked with print xxx stuff, if there is an error during the py_runxxx(), my redirected stderr will capture the error message in multibyte string, so is there a way can make python output it's message in unicode way? and there seems to be an available solution in this post

http://stackoverflow.com/questions/1956142/how-to-redirect-stderr-in-python I'll try it later

A: 

You can work with Unicode in python either by marking strings as Unicode (ie: u'Hello World') or by using the encode() method that all strings have.

Eg. assuming you have a Unicode string, aStringVariable:

aStringVariable.encode('utf-8')

will convert it to UTF-8. 'utf-16' will give you UTF-16 and 'ascii' will convert it to a plain old ASCII string.

For more information, see:

Adam Luchjenbroers
1. It is a bad practice to shadow builtin names (`str()` in this case). 2. `.encode()` should be called on Unicode string and not on byte-string.
J.F. Sebastian
That was just a bad choice for a variable name. I've changed it to something more obvious.
Adam Luchjenbroers
A: 

[incorrect self-answer removed, text copied as an update to the question]

fancyzero
A: 

wsprintf? This seems to be a C/C++ question rather than a Python question. The Python interpreter always writes bytestrings to stdout/stderr, rather than unicode (or "wide") strings. It means Python first encodes all unicode data using the current encoding (likely sys.getdefaultencoding()). If you want to get at stdout/stderr as unicode data, you must decode it by yourself using the right encoding. Your favourite C/C++ library certainly has what it takes to do that.

Antoine P.
+1  A: 

First, please remember that on Windows console may not fully support Unicode.

The example below does make python output to stderr and stdout using UTF-8. If you want you could change it to other encodings.

#!/usr/bin/python
# -*- coding: UTF-8 -*-

import codecs, sys

reload(sys)
sys.setdefaultencoding('utf-8')

print sys.getdefaultencoding()

sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points."
Sorin Sbarnea