views:

71

answers:

1

I am using this code to find files recursively in a folder , with size greater than 50000 bytes.

def listall(parent):
    lis=[] 
    for root, dirs, files in os.walk(parent):
         for name in files:
             if os.path.getsize(os.path.join(root,name))>500000:                                
                   lis.append(os.path.join(root,name))
    return lis 

This is working fine. But when I used this on 'temporary internet files' folder in windows, am getting this error.

Traceback (most recent call last):
File "<pyshell#4>", line 1, 
in <module> listall(a) File "<pyshell#2>", 
line 5, in listall if os.path.getsize(os.path.join(root,name))>500000: 
File "C:\Python26\lib\genericpath.py", line 49, in getsize return os.stat(filename).st_size WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Documents and Settings\\khedarnatha\\Local Settings\\Temporary Internet Files\\Content.IE5\\EDS8C2V7\\??????+1[1].jpg' 

I think this is because windows gives names with special characters in this specific folder... Please help to sort out this issue.

+3  A: 

It's because the saved file ‘(something)+1[1].jpg’ has non-ASCII characters in its name, characters that don't fit into the ‘system default code page’ (also misleadingly known as ‘ANSI’).

Programs like Python that use the byte-based C standard library (stdio) file access functions have big problems with Unicode filenames. On other platforms they can just use UTF-8 and everyone's happy, but on Windows the system default code page is never UTF-8, so there will always be characters that can't be represented in the given encoding. They'll get replaced with ? or sometimes other similar-looking characters, and then when you try to read the files with mangled names you'll get errors like the above.

Which code page you get depends on your locale: on Western Windows installs it'll be cp1252 (similar to ISO-8859-1, ‘Latin-1’), so you'll only be to use these characters.

Luckily, reasonably recent versions of Python (2.3+, according to PEP277) can also directly support Unicode filenames by using the native Win32 APIs instead of stdio. If you pass a Unicode string into os.listdir(), Python will use these native-Unicode APIs and you'll get Unicode strings back, which will include the original characters in the filename instead of mangled ones. So if you call listall with a Unicode pathname:

listall(ur'C:\Documents and Settings\khedarnatha\Local Settings\Temporary Internet Files')

it should Just Work.

bobince
Thank you very much Mr. Bobince....This was something that I was really searching for...Now I am getting the file names also, as I wanted. So now I can search files according to the name also. Thank you once again :)
pythBegin
@pythBegin : DOn't forget to accept the answer if it fix your problem
luc
I have accepted it...Luc...
pythBegin