views:

54

answers:

2

I'm trying to open a file and I just realized that py is having trouble with my username (It's in Russian). Any suggestions on how to properly decode/encode this to make idle happy?

I'm using py 2.6.5

xmlfile = open(u"D:\\Users\\Эрик\\Downloads\\temp.xml", "r")

Traceback (most recent call last):
  File "<pyshell#23>", line 1, in <module>
    xmlfile = open(str(u"D:\\Users\\Эрик\\Downloads\\temp.xml"), "r")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-12: ordinal not in range(128)

os.sys.getfilesystemencoding() 'mbcs'

xmlfile = open(u"D:\Users\Эрик\Downloads\temp.xml".encode("mbcs"), "r")

Traceback (most recent call last): File "", line 1, in xmlfile = open(u"D:\Users\Эрик\Downloads\temp.xml".encode("mbcs"), "r") IOError: [Errno 22] invalid mode ('r') or filename: 'D:\Users\Y?ee\Downloads\temp.xml'

A: 

The first problem is that the parser tries to interpret backslashes in strings unless you use the r"raw quote" prefix. In 2.6.5, you needn't treat your Unicode string specially, but you may need a file encoding declaration in your source code like:

# -*- coding: utf-8 -*-

as defined in PEP 263. Here is an example of it working interactively:

$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2
>>> f = r"D:\Users\Эрик\Downloads\temp.xml"
>>> f
'D:\\Users\\\xd0\xad\xd1\x80\xd0\xb8\xd0\xba\\Downloads\\temp.xml'
>>> x = open(f, 'w')
>>> x.close()
>>> 
$ ls D*
D:\Users\Эрик\Downloads\temp.xml

Yes, this is on a Unix system so the \ isn't meaningful and my terminal encoding is utf-8, but it works. You just may have to give the coding hint to the parser when it is reading a file.

msw
A: 

First problem:

xmlfile = open(u"D:\\Users\\Эрик\\Downloads\\temp.xml", "r")
### The above line should be OK, provided that you have the correct coding line
### For example # coding: cp1251

Traceback (most recent call last):
  File "<pyshell#23>", line 1, in <module>
    xmlfile = open(str(u"D:\\Users\\Эрик\\Downloads\\temp.xml"), "r")
### HOWEVER the above traceback line shows you actually using str()
### which is DIRECTLY causing the error because it is attempting
### to decode your filename using the default ASCII codec -- DON'T DO THAT.
### Please copy/paste; don't type from memory.
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-12: ordinal not in range(128)

Second problem:

os.sys.getfilesystemencoding() produces 'mbcs'

xmlfile = open(u"D:\Users\Эрик\Downloads\temp.xml".encode("mbcs"), "r")
### (a) \t is interpreted as a TAB character, hence the file name is invalid.
### (b) encoding with mbcs seems not to be useful; it messes up your name ("Y?ee").

Traceback (most recent call last):
File "", line 1, in xmlfile = open(u"D:\Users\Эрик\Downloads\temp.xml".encode("mbcs"), "r")
IOError: [Errno 22] invalid mode ('r') or filename: 'D:\Users\Y?ee\Downloads\temp.xml'

General advice on hard-coding filenames in Windows, in descending order of preference:

(1) Don't
(2) Use / e.g. "c:/temp.xml"
(3) Use raw strings with backslashes r"c:\temp.xml"
(4) Use doubled backslashes "c:\\temp.xml"

John Machin