tags:

views:

184

answers:

3

Is there any built in functions to find all the files under a particular directory including files under subdirectories ? I have tried this code, but not working...may be the logic itself is wrong...

def fun(mydir):
    lis=glob.glob(mydir)
    length=len(lis)
    l,i=0,0
    if len(lis):
        while(l+i<length):
            if os.path.isfile(lis[i]):
                final.append(lis[i])
                lis.pop(i)
                l=l+1
                i=i+1
            else:
                i=i+1
            print final
        fun(lis)
    else:
        print final
A: 

os.walk() is what you need.

Marco Mariani
+5  A: 

There is no built-in function, but using os.walk it's trivial to construct it:

import os
def recursive_file_gen(mydir):
    for root, dirs, files in os.walk(mydir):
        for file in files:
            yield os.path.join(root, file)

ETA: the os.walk function walks directory tree recursively; the recursive_file_gen function is a generator (uses yield keyword to produce next file). To get the resulting list do:

list(recursive_file_gen(mydir))
SilentGhost
Thank you SilentGhost...:)Can you please explain the code...
pythBegin
@pythbegin: added explanation, do ask if any specific point is not clear.
SilentGhost
ok...but I didnt understand the yield part...
pythBegin
@pyth: there is a [formal definition in Python docs](http://docs.python.org/reference/simple_stmts.html#the-yield-statement).
SilentGhost
ok...I made some changes to your code and it is like this nowdef listall(parent): lis=[] for root, dirs, files in os.walk(parent): for name in files: if os.path.getsize(os.path.join(root,name))>500000: lis.append(os.path.join(root,name)) return lisMy aim is to find all the files with size greater than 500000...and it is working properly...But when I used this function on 'Temporary Internet Files' folder in Windows am getting this error...I think its because of the special characters in the file name.Can u suggest something ?
pythBegin
sorry...I forgot to mention the errorTraceback (most recent call last): File "<pyshell#4>", line 1, in <module> listall(a) File "<pyshell#2>", line 5, in listall if os.path.getsize(os.path.join(root,name))>500000: File "C:\Python26\lib\genericpath.py", line 49, in getsize return os.stat(filename).st_sizeWindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Documents and Settings\\khedarnatha\\Local Settings\\Temporary Internet Files\\Content.IE5\\EDS8C2V7\\??????+1[1].jpg'This is it
pythBegin
@pyth: I suspect it has to do with the encoding of the file name. It's hard to say, since you don't provide sample of the file names. Clearly, `?` cannot be present in the filename, since it's invalid. Try to see what was the actual name of the file. What did `os.walk` returned and what `os.path.join` returned. I'd suggest you ask separate question, as it is beyond limits of this one.
SilentGhost
ok am posting a seperate question for this.
pythBegin
A: 

I highly recommend this path module, written by Jason Orendorff:

http://pypi.python.org/pypi/path.py/2.2

Unfortunately, his website is down now, but you can still download from the above link (or through easy_install, if you prefer).

Using this path module, you can do various actions on paths, including the walking files you requested. Here's an example:

from path import path

my_path = path('.')

for file in my_path.walkfiles():
    print file

for file in my_path.walkfiles('*.pdf'):
    print file

There are also convenience functions for many other things to do with paths:

In [1]: from path import path

In [2]: my_dir = path('my_dir')

In [3]: my_file = path('readme.txt')

In [5]: print my_dir / my_file
my_dir/readme.txt

In [6]: joined_path = my_dir / my_file

In [7]: print joined_path
my_dir/readme.txt

In [8]: print joined_path.parent
my_dir

In [9]: print joined_path.name
readme.txt

In [10]: print joined_path.namebase
readme

In [11]: print joined_path.ext
.txt

In [12]: joined_path.copy('some_output_path.txt')

In [13]: print path('some_output_path.txt').isfile()
True

In [14]: print path('some_output_path.txt').isdir()
False

There are more operations that can be done too, but these are some of the ones that I use most often. Notice that the path class inherits from string, so it can be used wherever a string is used. Also, notice that two or more path objects can easily be joined together by using the overridden / operator.

Hope this helps!

naitsirhc