ansaurus

Question

Unicode filenames on python 2.6 under Mac OS X

Answer 1

+2 A:

Note that walk(dir) returns the filename without path. If you want to open the file, you must prepend dir:

for dirpath, dirnames, filenames in os.walk(dir):
    for filename in filenames:
        path = os.path.join(dirpath, filename)

Aaron Digulla 2009-12-11 10:42:51

Ouch. That hurts. Does that API date back to something like 1970 or so?

Joey 2009-12-11 10:46:22

Not really. Python 2.6 has two string types: One is byte based and the other is unicode (16bit) based. No filesystems in the world supports Unicode but some can handle UTF-8 encoded names (Linux or Windows, for example). The main difference to Windows is that Windows has an API which you can pass Unicode strings and it will do the conversion internally. In Python, you just have to do it in your code (upto version 3.0). This is mainly to support many OSs.

Aaron Digulla 2009-12-11 10:51:51

However, even when encoded as a bytestring, I get IOError: [Errno 2] No such file or directory: '01 \xe7\xa9\xba\xe5\x8d\xb3\xe6\x98\xaf\xe8\x89\xb2.mp3'

Ripdog 2009-12-11 10:52:12

Aaron: NTFS uses UTF-16 for file names. Exclusively. Windows APIs also only use UTF-16 for that purpose so Python does convert there already. OS X uses UTF-8 in NFD, iirc so maybe normalization has to be done within Python (unless the Unicode string is already normalized). I also didn't mean the single-byte/Unicode string dichotomy in Python (I think it's a bad idea but I know about it). It's more that if your language supports Unicode strings you can expect its APIs to handle them too.

Joey 2009-12-11 10:57:47

@Ripdog: Okay ... Is that file in the same directory where you started the script? Otherwise, you forgot to prepend the path. Try `path = os.path.join(directory, filename)` in the loop in `startScan()`

Aaron Digulla 2009-12-11 10:59:27

Well, that's pretty embarrassing. It seems that you are right, but now I have a new problem. How do I get the full pathname of files from walk() when the files are 3 directories deep from the walk start point?

Ripdog 2009-12-11 11:09:37

Read the docs to `walk` carefully: http://docs.python.org/library/os.html#os.walk The first parameter is the path for all items in the second and third list.

Aaron Digulla 2009-12-11 11:33:47

-1 it seems wrong. `os.walk` will output unicode if started with unicode. `open` will accept unicode. If it doesn't work, try `io.open` as well.

kaizer.se 2009-12-11 12:09:17

@kaizer.se: Is that also true for Python 2.6? IIRC, this only works with Python 3+

Aaron Digulla 2009-12-11 12:52:40

I don't know about os.walk, but I know that os.listdir() returns unicode if you feed it unicode, and byte strings if you feed it a byte string. We can assume it's the same for os.walk.

Virgil Dupras 2009-12-11 13:02:05

Fixed my answer.

Aaron Digulla 2009-12-11 13:21:11

ansaurus

tags:

views:

answers:

Unicode filenames on python 2.6 under Mac OS X

related questions