views:

106

answers:

2

When I try to print an unicode string on my dev server it works correctly but production server raises exception.

File "/home/user/twistedapp/server.py", line 97, in stringReceived
    print "sent:" + json
File "/usr/lib/python2.6/dist-packages/twisted/python/log.py", line 555, in write
    d = (self.buf + data).split('\n')
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 28: ordinal not in range(128)

Actually it is twisted application and print forwards to log file.

repr() of strings are the same. Locale set to en_US.UTF-8.

Are there any configs I need to check to make it work the same on the both servers?

+6  A: 

printing of Unicode strings relies on sys.stdout (the process's standard output) having a correct .encoding attribute that Python can use to encode the unicode string into a byte string to perform the required printing -- and that setting depends on the way the OS is set up, where standard output is directed to, and so forth.

If there's no such attribute, the default coded ascii is used, and, as you've seen, it often does not provide the desired results;-).

You can check getattr(sys.stdout, 'encoding', None) to see if the encoding is there (if it is, you can just keep your fingers crossed that it's correct... or, maybe, try some heavily platform-specific trick to guess at the correct system encoding to check;-). If it isn't, in general, there's no reliable or cross-platform way to guess what it could be. You could try 'utf8', the universal encoding that works in a lot of cases (surely more than ascii does;-), but it's really a spin of the roulette wheel.

For more reliability, your program should have its own configuration file to tell it what output encoding to use (maybe with 'utf8' just as the default if not otherwise specified).

It's also better, for portability, to perform your own encoding, that is, not

print someunicode

but rather

print someunicode.encode(thecodec)

and actually, if you'd rather have incomplete output than a crash,

print someunicode.encode(thecodec, 'ignore')

(which simply skips non-encodable characters), or, usually better,

print someunicode.encode(thecodec, 'replace')

(which uses question-mark placeholders for non-encodable characters).

Alex Martelli
I think it's worth mentioning that on UNIX systems, sys.stdout.encoding is set based on the `LANG`, `LC_ALL` and `LC_CTYPE` environment variables, and that it is *only* set if sys.stdout is connected to a terminal. The same working prints can unfortunately break when you redirect output to a file or another program. This makes it even more important to explicitly encode your unicode.
Thomas Wouters
@Thomas, yep, absolutely, excellent point!
Alex Martelli
It doesn't work cause print outputs to logs. I updated my question. Thanks for your response. Locale set to en_US.UTF-8 on the both servers.
Gregory Lo
I randomly tried different encodings and mystring.decode('utf8') seems to work on the production server. But it raises exception on the dev:
Gregory Lo
@Alex Never experienced this problem, but great answer. I'm sure this info will come in handy in the future :)
Michael Mior
@Gregory, so, on the dev, use `mystring.encode(sys.stdout.encoding)` (definitely **NOT** `.decode` as you say you're doing, that's simply crazy and will work under **no** circumstances -- what are you trying to accomplish that way?!), `'utf8'` on the prod. `getattr(sys.stdout, 'encoding', 'utf8')` will give you the right codec name on either system (and **do** try the `'replace'` for added safety, too!).
Alex Martelli
Actually, `unicode.decode()` *will* work if the unicode happens to be encodable in ASCII :-) But as soon as it contains non-ASCII data, *poof*.
Thomas Wouters
@Thomas, it "works", at best, by doing no operation, i.e., _no_ "work"; so, while you can hope it won't actually blow you away, it still, even in those lucky accidental cases, doesn't _work_ (by any sensible definition of "work";-).
Alex Martelli
+1  A: 

Unicode is not supported by Twisted's built-in log observers. See http://twistedmatrix.com/trac/ticket/989 for progress on adding support for this, or to see what you can do to help out.

Until #989 is resolved and the fix is in a Twisted release your application is deployed on, do not log unicode. Only log str.

Jean-Paul Calderone
why it may work differently on the different servers?
Gregory Lo
It would be ok if I would need to encode or decode or something. But production server requires to do decode('utf8') and dev server don't allow to do it.
Gregory Lo