ansaurus

Question

Python print works differently on different servers

Answer 1

+6 A:

printing of Unicode strings relies on sys.stdout (the process's standard output) having a correct .encoding attribute that Python can use to encode the unicode string into a byte string to perform the required printing -- and that setting depends on the way the OS is set up, where standard output is directed to, and so forth.

If there's no such attribute, the default coded ascii is used, and, as you've seen, it often does not provide the desired results;-).

You can check getattr(sys.stdout, 'encoding', None) to see if the encoding is there (if it is, you can just keep your fingers crossed that it's correct... or, maybe, try some heavily platform-specific trick to guess at the correct system encoding to check;-). If it isn't, in general, there's no reliable or cross-platform way to guess what it could be. You could try 'utf8', the universal encoding that works in a lot of cases (surely more than ascii does;-), but it's really a spin of the roulette wheel.

For more reliability, your program should have its own configuration file to tell it what output encoding to use (maybe with 'utf8' just as the default if not otherwise specified).

It's also better, for portability, to perform your own encoding, that is, not

print someunicode

but rather

print someunicode.encode(thecodec)

and actually, if you'd rather have incomplete output than a crash,

print someunicode.encode(thecodec, 'ignore')

(which simply skips non-encodable characters), or, usually better,

print someunicode.encode(thecodec, 'replace')

(which uses question-mark placeholders for non-encodable characters).

Alex Martelli 2010-09-18 15:12:32

I think it's worth mentioning that on UNIX systems, sys.stdout.encoding is set based on the `LANG`, `LC_ALL` and `LC_CTYPE` environment variables, and that it is *only* set if sys.stdout is connected to a terminal. The same working prints can unfortunately break when you redirect output to a file or another program. This makes it even more important to explicitly encode your unicode.

Thomas Wouters 2010-09-18 15:25:58

@Thomas, yep, absolutely, excellent point!

Alex Martelli 2010-09-18 16:11:51

It doesn't work cause print outputs to logs. I updated my question. Thanks for your response. Locale set to en_US.UTF-8 on the both servers.

Gregory Lo 2010-09-18 16:18:43

I randomly tried different encodings and mystring.decode('utf8') seems to work on the production server. But it raises exception on the dev:

Gregory Lo 2010-09-18 16:38:03

@Alex Never experienced this problem, but great answer. I'm sure this info will come in handy in the future :)

Michael Mior 2010-09-18 17:24:56

@Gregory, so, on the dev, use `mystring.encode(sys.stdout.encoding)` (definitely **NOT** `.decode` as you say you're doing, that's simply crazy and will work under **no** circumstances -- what are you trying to accomplish that way?!), `'utf8'` on the prod. `getattr(sys.stdout, 'encoding', 'utf8')` will give you the right codec name on either system (and **do** try the `'replace'` for added safety, too!).

Alex Martelli 2010-09-18 17:52:53

Actually, `unicode.decode()` *will* work if the unicode happens to be encodable in ASCII :-) But as soon as it contains non-ASCII data, *poof*.

Thomas Wouters 2010-09-18 19:51:17

@Thomas, it "works", at best, by doing no operation, i.e., _no_ "work"; so, while you can hope it won't actually blow you away, it still, even in those lucky accidental cases, doesn't _work_ (by any sensible definition of "work";-).

Alex Martelli 2010-09-18 20:05:17

Answer 2

+1 A:

Unicode is not supported by Twisted's built-in log observers. See http://twistedmatrix.com/trac/ticket/989 for progress on adding support for this, or to see what you can do to help out.

Until #989 is resolved and the fix is in a Twisted release your application is deployed on, do not log unicode. Only log str.

Jean-Paul Calderone 2010-09-18 17:02:16

why it may work differently on the different servers?

Gregory Lo 2010-09-18 17:14:54

It would be ok if I would need to encode or decode or something. But production server requires to do decode('utf8') and dev server don't allow to do it.

Gregory Lo 2010-09-18 17:18:45

ansaurus

tags:

views:

answers:

Python print works differently on different servers

related questions