tags:

views:

136

answers:

3

If doing a directory listing and reading the files within, at what point does the performance of yield start to deteriorate, compared to returning a list of all the files in the directory?

Here I'm assuming one has enough RAM to return the (potentially huge) list.

PS I'm having problems inlining code in a comment, so I'll put some examples in here.

def list_dirs_list():
    # list version
    return glob.glob(/some/path/*)

def list_dirs_iter():
    # iterator version
    return glob.iglob(/some/path/*)

Behind the scenes both calls to glob use os.listdir so it would seem they are equivalent performance-wise. But this Python doc seems to imply glob.iglob is faster.

+2  A: 

It depends on how you're doing the directory listing. Most mechanisms in Python pull the entire directory listing into a list; if doing it that way then even a single yield is a waste. If using opendir(3) then it's probably a random number, according to XKCD's definition of "random".

Ignacio Vazquez-Abrams
Thanks.I was debating between os.listdir and os.walk, but I suppose that point is now moot (from a performance perspective).More generally, are there cases where too many yields become an issue (due to, for example, python's implementation assumptions?)
saidimu
Certainly nothing that comes to mind.
Ignacio Vazquez-Abrams
+5  A: 

There is no point at which further use of yield results in decreased performance. In fact, as compared to assembling things in a list, yield actually improves by comparison the more elements there are.

recursive
+1  A: 

using yield is functionally similar to writing a functor class, even from an implementation or performance perspective, except that it can probably actually call the generator a little bit quicker than the __call__ method on a self-made class, because that is built in to the generator's C implementation.

To hammer this home, the use and rough implementation of the following is the same:

def generator_counter():
    i = 0
    while True:
        i += 1
        yield i

class functor_counter():
    def __init__(self):
        self.i = 0
    def __call__(self):
        i += 1
        return i
TokenMacGuy