ansaurus

Question

Unicode filenames on Windows with Python & subprocess.Popen()

Answer 1

+1 A:

Docs for sys.getfilesystemencoding() say that for Windows NT and later, file names are natively Unicode. If you have a valid unicode file name, why would you bother encoding it using mbcs?

Docs for codecs module say that mbcs encodes using "ANSI code page" (which will differ depending on user's locale) so if the locale doesn't use Cyrillic characters, splat.

Edit: So your process is calling subprocess.Popen(). If your invoked process is under your control, the two processes ahould be able to agree to use UTF-8 as the Unicode Transport Format. Otherwise, you may need to ask on the pywin32 mailing list. In any case, edit your question to state the degree of control you have over the invoked process.

John Machin 2009-12-15 21:17:48

I have a file on my system with the name in Cyrillic. I want to call subprocess.Popen() with that file as an argument. Popen won't handle unicode. Normally I can get away with encoding the argument with the codec given by sys.getfilesystemencoding(). In this case it won't work.

Norman 2009-12-15 23:30:03

@Norman: please edit your question to include the info about subprocess.Popen()

John Machin 2009-12-16 04:20:52

Answer 2

A:

If you need to pass the name of an existing file, then you might have a better chance of success by passing the 8.3 version of the Unicode filename.

You need to have the pywin32 package installed, then you can do:

>>> import win32api
>>> win32api.GetShortPathName(u"C:\\Program Files")
'C:\\PROGRA~1'

I believe these short filenames use only ASCII characters, and therefore you should be able to use them as arguments to a command line.

Should you need to specify also filenames to be created, you can create them with zero size in advance from Python using Unicode filenames, and pass the short name of the file as an argument.

UPDATE: user bogdan says correctly that 8.3 filename generation can be disabled (I had it disabled, too, when I had Windows XP on my laptop), so you can't rely on them. So, as another more far-fetched approach when working on NTFS volumes, one can hard link the Unicode filenames to plain ASCII ones; pass the ASCII filenames to an external command and delete them afterwards.

ΤΖΩΤΖΙΟΥ 2009-12-29 01:54:30

You should never try to use 8.3 filenames, please remember that these are optional and they can be missing. It's a common practice to disable NTFS shortfilename generation in order to speedup filesystem.

bogdan 2010-01-04 15:14:19

If I may object to your first subsentence: one can *try* using 8.3 filenames, but should *not rely* on them. Ergo my "you might have a better chance".

ΤΖΩΤΖΙΟΥ 2010-01-04 23:46:31

ansaurus

tags:

views:

answers:

Unicode filenames on Windows with Python & subprocess.Popen()

related questions