views:

118

answers:

4

I've got some problem with unichr() on my server. Please see below:

On my server (Ubuntu 9.04):

>>> print unichr(255)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128)

On my desktop (Ubuntu 9.10):

>>> print unichr(255)
ÿ

I'm fairly new to python so I don't know how to solve this. Anyone care to help? Thanks.

+2  A: 

The terminal settings on your server are different, probably set to 7-bit US ASCII.

unwind
+2  A: 

It's not really unichr() related. Problem is with locale setting in your server environment, as it's probably set to something like en_US and it's not unicode aware.

Łukasz
That's probably it since I get the same result when I'm executing the code trough a .py-file. Do you know what I need to do to change this?
jacob
try $ export LANGUAGE="en_US.UTF-8"
Łukasz
+1  A: 

Consider using an explicit encoding when printing unicode strings where OS settings are not uniform.

unicode.encode([encoding[, errors]])

Return an encoded version of the string. Default encoding is the current default string encoding. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a UnicodeError. Other possible values are 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace' and any other name registered via codecs.register_error(), see section Codec Base Classes. For a list of possible encodings, see section Standard Encodings.

For example,

>>> print unichr(0xff).encode('iso8859-1')
����??
>>> 
gimel
+2  A: 

When using the "print" keyword, you'll be writing to the sys.stdout output stream. sys.stdout can usually only display Unicode strings if the characters can be converted to ascii using str(message).

You'll need to encode to your OS's terminal encoding when printing to be able to do this.

The locale module can sometimes detect the encoding of the output console:

import locale
print unichr(0xff).encode(locale.getdefaultlocale()[1], 'replace')

but it's usually better to just specify the encoding yourself, as python often gets it wrong:

print unichr(0xff).encode('latin-1', 'replace')

UTF-8 or latin-1 I think is often used in many modern linux distros.

If you know the encoding of your console, the lines below will encode Unicode strings automatically when you use "print":

import sys
import codecs
sys.stdout = codecs.getwriter(ENCODING)(sys.stdout)

If the encoding is ascii or something similar, you may need to change the console encoding of your OS to be able to display that character.

See also: http://wiki.python.org/moin/PrintFails

David Morrissey