tags:

views:

51

answers:

3

Python 2.6.5 is said to support Unicode? How come listdir() doesn't in IDLE, but Python 3.1.2 does show Unicode in IDLE? (this is tested on Windows 7)

The following code is the same behavior:

for dirname, dirnames, filenames in os.walk('c:\path\somewhere'):
    for subdirname in dirnames:
        print (os.path.join(dirname, subdirname))
    for filename in filenames:
        print (os.path.join(dirname, filename))

Update: the unicode is in the filenames, not in the path...

+3  A: 

The syntax for Unicode strings changed from 2 to 3. Try specifying a Unicode string like this:

u'c:\\path\\somewhere'

If you want the syntax of Python 3 (string literals are by default Unicode unless the b prefix is given), use

from __future__ import unicode_literals

at the top of your file.

Philipp
indeed. The key point being that in Python 2, you only get Unicode pathnames out of `listdir()` and related functions if you specifically ask for them by passing a Unicode string in. `os.listdir('.')` gives you different results from `os.listdir(u'.')`.
bobince
interesting... it will then show "\u6c34 ..." is there a way to show those as glyph instead of the number?
動靜能量
What do you mean with "show"? At least the `print` function/statement should show them without escape characters. Otherwise, please post it as new question since it's not related to `os.walk`.
Philipp
in Python 3.1.2, those unicode characters are shown as the glyph itself... not \u6c34, if i do a listdir()
動靜能量
@Jian: You're only seeing it that way because you're looking at Python's internal representation of the resulting strings in the shell. Just use 'print' on them and (assuming a proper UTF-8 supporting terminal), you'll see what you expect. E.g. `for s in os.listdir('path'): print s`
Nicholas Knight
+2  A: 

Python 3 makes all strings Unicode by default that's probably why it works with Python 3 out of the box.

The documentation for listdir states

Changed in version 2.3: On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects. Undecodable filenames will still be returned as string objects.

So I guess you have to give your path as a Unicode string explicitly in Python 2 to get the result as Unicode.

Joey
A: 

Python 2.x supports unicode but unicode isn't the default (as it is for 3.x).

In Python 2.x, Strings are 8bit byte arrays by default, so you'll see the UTF-8 encoded filenames when you work with the filesystem.

In Python 3.x all strings are in fact unicode by default, so the UTF-8 decoding happens in the IO subroutines.

Aaron Digulla