views:

99

answers:

1

I'm trying to work my way through some frustrating encoding issues by going back to basics. In Dive Into Python example 9.14 (here) we have this:

>>> s = u'La Pe\xf1a'
>>> print s
Traceback (innermost last): File "<interactive input>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
>>> print s.encode('latin-1')
La Peña

But on my machine, this happens:

>>> sys.getdefaultencoding()
'ascii'
>>> s = u'La Pe\xf1a'
>>> print s
La Peña

I don't understand why these are different. Anybody?

+6  A: 

The default encoding for print doesn't depend on sys.getdefaultencoding(), but on sys.stdout.encoding. If you launch python with e.g. LANG=C or redirect a python script to a file, the encoding for stdout will be ANSI_X3.4-1968. On the other hand, if sys.stdout is a terminal, it will use the terminal's encoding.

To explain what sys.getdefaultencoding() does -- it's used when implicitly converting strings from/to unicode. In this example, str(u'La Pe\xf1a') with the default ASCII encoding would fail, but with modified default encoding it would encode the string to Latin-1. However setting the default encoding is a horrible idea, you should always use explicit encoding when you want to go from unicode to str.

Lukáš Lalinský