views:

347

answers:

2

In my Django application, a user has uploaded a file with a unicode character in the name.

When I'm downloading files, I'm calling :

os.path.exists(media)

to test that the file is there. This, in turn, seems to call

st = os.stat(path)

Which then blows up with the error :

UnicodeEncodeError: 'ascii' codec can't encode character u'\xcf' in position 92: ordinal not in range(128)

What can I do about this? Is there an option to path.exists to handle it?

Update : Actually, all I had to do was encode the argument to exists, ie.

os.path.exists(media.encode('utf-8')

Thanks everyone who answered.

A: 

Encode to the filesystem encoding before calling. See the locale module.

Ignacio Vazquez-Abrams
thanks for this. But I'm not sure I follow. Are you saying that I can tell Django that an uploaded file's name should be adapted? I don't see anything about this in the locale module.
interstar
You have to use the native system's encoding to refer to files. Try `locale.nl_langinfo(locale.CODESET)`.
Ignacio Vazquez-Abrams
+1  A: 

I'm assuming you're in Unix. If not, please remember to say which OS you're in.

Make sure your locale is set to UTF-8. All modern Linux systems do this by default, usually by setting the environment variable LANG to "en_US.UTF-8", or another language. Also, make sure your filenames are encoded in UTF-8.

With that set, there's no need to mess with encodings to access files in any language, even in Python 2.x.

[~/test] echo $LANG
en_US.UTF-8
[~/test] echo testing > 漢字
[~/test] python2.6
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.stat("漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> os.stat(u"漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> open("漢字").read()
'testing\n'
>>> open(u"漢字").read()
'testing\n'

If this doesn't work, run "locale"; if the values are "C" instead of en_US.UTF-8, you may not have the locale installed correctly.

If you're in Windows, I think Unicode filenames should always just work (at least for the os/posix modules), since the Unicode file API in Windows is supported transparently.

Glenn Maynard