ansaurus

Question

Python Unicode strings and the Python interactive interpreter

Answer 1

+1 A:

The interpreter uses your command prompt's native encoding for text entry. In your case it's CP437:

>>> print '\xa4'.decode('cp437')
ñ

Ignacio Vazquez-Abrams 2010-03-10 22:35:03

Answer 2

A:

You're getting confused because the editor and the interpreter are using different encodings themselves. The python interpreter uses your system default (in this case, cp437), while your editor uses utf-8.

Note, the difference disappears if you specify a unicode string, like so:

# Windows python interpreter
>>> s = "La caña de España"
>>> s
'La ca\xa4a de Espa\xa4a'
>>> s = u"La caña de España"
>>> s
u'La ca\xf1a de Espa\xf1a'

The moral of the story? Encodings are tricky. Be sure you know what encoding your source files are in, or play it safe by always using the escaped version of special characters.

Chris B. 2010-03-10 22:49:14

Answer 3

+1 A:

Let me expand Ignacio's reply: In both cases there is an extra layer between Python and you: in one case it is Sublime Text and in the other it's cmd.exe. The difference in behaviour you see is not due to Python but by the different encodings used by Sublime Text (utf-8, as it seems) and cmd.exe (cp437).

So, when you type ñ, Sublime Text sends '\xc3\xb1' to Python, whereas cmd.exe sends \xa4. [I'm simplyfing here, omitting details that are not relevant to the question.].

Still, Python knows about that. From cmd.exe you'll probably get something like:

>>> import sys
>>> sys.stdin.encoding
'cp437'

whereas within Sublime Text you'll get something like

>>> import sys
>>> sys.stdin.encoding
'utf-8'

krawyoti 2010-03-10 23:01:32

ansaurus

tags:

views:

answers:

Python Unicode strings and the Python interactive interpreter

related questions