Why does the following occur:
>>> u'\u0308'.encode('mbcs') #UMLAUT
'\xa8'
>>> u'\u041A'.encode('mbcs') #CYRILLIC CAPITAL LETTER KA
'?'
>>>
I have a Python application accepting filenames from the operating system. It works for some international users, but not others.
For example, this unicode filename: u'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'
will not encode with Windows 'mbcs' encoding (the one used by the filesystem, returned by sys.getfilesystemencoding()). I get '???????', indicating the encoder fails on those characters. But this makes no sense, since the filename came from the user to begin with.
Update: Here's the background to my reasons behind this... I have a file on my system with the name in Cyrillic. I want to call subprocess.Popen() with that file as an argument. Popen won't handle unicode. Normally I can get away with encoding the argument with the codec given by sys.getfilesystemencoding(). In this case it won't work