ansaurus

Question

Answer 1

+3 A:

On Unix systems, it should be in the user's locale, which is (strangely) not tied to sys.getdefaultencoding. See http://docs.python.org/library/locale.html.

In Windows, it'll be in the system ANSI codepage.

(By the way, those elementary school teachers who told you not to end a sentence with a preposition were lying to you.)

Glenn Maynard 2010-10-25 07:34:14

Answer 2

+3 A:

I don't know if this helps or not but this is what I get in DOS mode:

C:\Python27>python Lib\codingtest.py нер
['Lib\\codingtest.py', '\xed\xe5\xf0']

C:\Python27>python Lib\codingtest.py hello
['Lib\\codingtest.py', 'hello']

In IDLE:

>>> print "hello"
hello
>>> "hello"
'hello'
>>> "привет"
'\xef\xf0\xe8\xe2\xe5\xf2'
>>> print "привет"
привет
>>> sys.getdefaultencoding()
'ascii'
>>>

What can we deduce from this? I don't know yet... I'll comment in a little bit.

A little bit later: sys.argv is encoded with sys.stdin.encoding and not sys.getdefaultencoding()

Soulseekah 2010-10-25 07:46:02

`print sys.stdin.encoding``cp1251`

Soulseekah 2010-10-25 07:55:02

\xef is the UNICODE CP1251 Cyrillic representation of SMALL LETTER PE ('п'), thus I'm beginning to believe that `sys.argv` is encoded with `sys.stin.encoding` and not `sys.getdefaultencoding()`

Soulseekah 2010-10-25 08:02:39

Answer 3

+2 A:

A few observations:

(1) It's certainly not sys.getdefaultencoding.

(2) sys.stdin.encoding appears to be a much better bet.

(3) On Windows, the actual value of sys.stdin.encoding will vary, depending on what software is providing the stdio. IDLE will use the system "ANSI" code page, e.g. cp1252 in most of Western Europe and America and former colonies thereof. However in the Command Prompt window, which emulates MS-DOS more or less, the corresponding old DOS code page (e.g. cp850) will be used by default. This can be changed by using the CHCP (change code page) command.

(4) The documentation for the subprocess module doesn't provide any suggestions on what encoding to use for args and stdout.

(5) One trusts that assert sys.stdin.encoding == sys.stdout.encoding never fails.

John Machin 2010-10-25 09:38:42

The observations seem to be correct, I have also observed the same. Do you have any idea of what exactly the sys.getdefaultencoding returns?

anand 2010-10-25 09:55:34

"It returns the name of the current default string encoding used by the Unicode implementation." I think it means that Python uses the defaultencoding() in its console. You can override the defaultencoding() by prepending `u'` by the way. Great answer +1

Soulseekah 2010-10-25 11:38:33

@John: I agree about (2)--I thought of it later. (5) is actually not true: under Unix, `python test.py > test.txt` can for instance have UTF-8 for the stdin encoding and None for the stdout encoding.

EOL 2010-10-25 15:32:46

ansaurus

tags:

views:

answers:

Python: in what encoding is sys.argv?

related questions