views:

735

answers:

1

After I learned about reading unicode files in Python 3.0 web script, now it's time for me to learn using print() with unicode.

I searched for writing unicode, for example this question explains that you can't write unicode characters to non-unicode console. However, in my case, the output is given to Apache and I am sure that it is capable of handling unicode text. For some reason, however, the stdout of my web script is in ascii.

Obviously, if I was opening a file to write myself, I would do something like

open(filename, 'w', encoding='utf8')

but since I'm given an open stream, I resorted to using

sys.stdout.buffer.write(mytext.encode('utf-8'))

and everything seems to work. Does this violate some rule of good behavior or has any unintended consequences?

+5  A: 

I don't think you're breaking any rule, but

sys.stdout = codecs.EncodedFile(sys.stdout, 'utf8')

looks like it might be handier / less clunky.

Edit: per comments, this isn't quite right -- @Miles gave the right variant (thanks!):

sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)

Edit: if you can arrange for environment variable PYTHONIOENCODING to be set to utf8 when Apache starts your script, that would be even better, making sys.stdout be set to utf8 automatically; but if that's unfeasible or impractical the codecs solution stands.

Alex Martelli
With this line I get "TypeError: can't write bytes to text stream"
ilya n.
I think it's because stdout starts already being a text stream with a *wrong* ascii codec.
ilya n.
Try: sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
Miles
@Miles, you have it just right -- hope you don't mind if I edit my answer to include your better idea...!
Alex Martelli
No problem. I didn't make my own answer because I'm not sure what constitutes "best practice" for a lot of Python 3 encoding issues. One thing I don't like is that, if all references to the original stdout TextIOWrapper are lost (if sys.__stdout__ is overwritten, for instance), the underlying buffer will be closed, and there is no way around that, AFAICT, other than to make sure a reference is maintained.
Miles
To be quite honest: nobody (including us, Python core committers) is sure what's "best practice" in Python 3 either, YET -- we're all still figuring it out!-). So another +1 on your latest comment...;-)
Alex Martelli
Thanks to all! That works, although I'm still a bit scared -- we the simple folk were taught to use the highest level abstraction possible...
ilya n.
Using "the highest _feasible_ level of abstraction" is a good rule of thumb. If you can arrange environment variable PYTHONIOENCODING to be set to 'utf8' when Apache runs your code, that would be even better, I'm editing the answer to reflect that; but how to arrange it is more of a sysadm problem (httpd.conf? wrapper shell script?) so I'm not getting into that.
Alex Martelli
When I use this answer (@Miles's), and then call the builtin input('a prompt'), it fails with "AttributeError: 'BufferedWriter' object has no attribute 'encoding'" from codecs.py. (I'm using Python 3.0.) Perhaps I'm doing something obviously dumb, being new to Python 3. My workaround: print the prompt in a separate statement and use no-arg input().
Darius Bacon