views:

341

answers:

5

I'm trying to run subprocess.call() with unicode filename, and here is simplified problem:

n = u'c:\\windows\\notepad.exe '
f = u'c:\\temp\\nèw.txt'

subprocess.call(n + f)

which raises famous error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8'

Encoding to utf-8 produces wrong filename, and mbcs passes filename as new.txt without accent

I just can't read any more on this confusing subject and spin in circle. I found here lot of answers for many different problems in past so I thought to join and ask for help myself

Thanks

A: 

I don't have an answer for you, but I've done a fair amount of research into this problem. Python converts all output (including system calls) to the same character as the terminal it is running in. Windows terminals use code pages for character mapping; the default code page is 437, but it can be changed with the chcp command. chcp 65001 will theoretically change the code page to utf-8, but as far as I know python doesn't know what to do with this, so you're SOL.

starbuck
+1  A: 

It appears that to make this work, the subprocess code would have to be modified to use a wide character version of CreateProcess (assuming that one exists). There's a PEP discussing the same change made for the file object at http://www.python.org/dev/peps/pep-0277/ Perhaps you could research the Windows C calls and propose a similar change for subprocess.

clahey
I don't feel up to the task for researching in this problem, thou funny to see it's author (Neil), who just released SciTE 2.10 with support for unicode (wide char) file name access
otrov
A: 

You can try opening the file as:

subprocess.call((n + f).encode("cp437"))

or whichever codepage chcp reports as being used in a command prompt window. If you try to chcp 65001 as starbuck suggested, you'll have to edit the stdlib encodings\aliases.py file and add cp65001 as an alias to 'utf-8' beforehand. It's an open issue in the Python source.

UPDATE: since this is a multiple target scenario, before running such a command, make sure you run a single chcp command first, analyse the output and retrieve the current "Command Prompt" (DOS) codepage. Subsequently, use the discovered codepage to encode the subprocess.call argument.

ΤΖΩΤΖΙΟΥ
I'm on cp1251, but program is supposed to run on different machines with arbitrary locale
otrov
cp1251 is the Windows codepage. When running commands with subprocess, you need to use the "DOS"/command prompt codepage.
ΤΖΩΤΖΙΟΥ
A: 

As ΤΖΩΤΖΙΟΥ and starbuck mentioned, the problem is with the console code page which is in your case 866 (in Russian localization of windows) and not 1251. Just run chcp in console.

The problem is the same as when you want output unicode to Windows console. Unfortunatelly you will need at least to reqister and alias for unicode as 'cp866' in encodings\aliases.py (or do it programmatically on script start) and change the code page of the console to 65001 before running the notepad and setting it back afterwards.

chcp 65001 & c:\WINDOWS\notepad.exe nèw.txt & chcp 866

By the way, to be able to run the command in console and see the filename correctly, you will need to change the console font to Lucida Console in console window properties.

It might be even worse: you will need to change the code page of the current process. To do that, you will need either run chcp 65001 right before the script start or use pywin32 to do it within the script.

newtover
otrov
A: 
import win32api
f = win32api.GetShortPathName(f)

The file must exist though. Dirty workaround, but it will work. http://sourceforge.net/projects/pywin32/

WGH