views:

304

answers:

3

I'm trying to create a utility class for traversing all the files in a directory, including those within subdirectories and sub-subdirectories. I tried to use a generator because generators are cool; however, I hit a snag.


def grab_files(directory):
    for name in os.listdir(directory):
        full_path = os.path.join(directory, name)
        if os.path.isdir(full_path):
            yield grab_files(full_path)
        elif os.path.isfile(full_path):
            yield full_path
        else:
            print('Unidentified name %s. It could be a symbolic link' % full_path)

When the generator reaches a directory, it simply yields the memory location of the new generator; it doesn't give me the contents of the directory.

How can I make the generator yield the contents of the directory instead of a new generator?

If there's already a simple library function to recursively list all the files in a directory structure, tell me about it. I don't intend to replicate a library function.

+11  A: 

Why reinvent the wheel when you can use os.walk

import os
for root, dirs, files in os.walk(path):
    for name in files:
        print os.path.join(root, name)

os.walk is a generator that yields the file names in a directory tree by walking the tree either top-down or bottom-up

Nadia Alramli
But then again, by reinventing the wheel we could `os.cycle` rather than `os.walk`...
mjv
I think it's a joke... "reinventing the wheel"? Walking vs. cycling? Pretty good.. :)
Ned Batchelder
Yes, Ned, a joke. The suggestion to os.walk() is the way-to-go, unless one is merely trying to learn about generators and uses directory traversal as a practical exercise for it.
mjv
@Ned: I literally just facepalmed.
Jed Smith
@mjv good one ;)
Nadia Alramli
A: 

You can use path.py. Unfortunately the author's website is no longer around, but you can still download the code from PyPI. This library is a wrapper around path functions in the os module.

path.py provides a walkfiles() method which returns a generator iterating recursively over all files in the directory:

>>> from path import path
>>> print path.walkfiles.__doc__
 D.walkfiles() -> iterator over files in D, recursively.

        The optional argument, pattern, limits the results to files
        with names that match the pattern.  For example,
        mydir.walkfiles('*.tmp') yields only files with the .tmp
        extension.

>>> p = path('/tmp')
>>> p.walkfiles()
<generator object walkfiles at 0x8ca75a4>
>>>
Mike Mazur
+1  A: 

I agree with the os.walk solution

For pure pedantic purpose, try iterate over the generator object, instead of returning it directly:


def grab_files(directory):
    for name in os.listdir(directory):
        full_path = os.path.join(directory, name)
        if os.path.isdir(full_path):
            for entry in grab_files(full_path):
                yield entry
        elif os.path.isfile(full_path):
            yield full_path
        else:
            print('Unidentified name %s. It could be a symbolic link' % full_path)
sjthebat
Thanks for the example. I figured out this solution about five minutes after I had posted the question. XD
Evan Kroske