ansaurus

Question

When is not a good time to use python generators?

Answer 1

+9 A:

You should never favor zip over izip, range over xrange, or list comprehensions over generator comprehensions. In Python 3.0 range has xrange-like semantics and zip has izip-like semantics.

List comprehensions are actually clearer like list(frob(x) for x in foo) for those times you need an actual list.

Steven Huwig 2008-10-29 04:28:58

@Steven I don't disagree, but I am wondering what the reasoning behind your answer is. Why should zip, range, and list comprehensions never be favoured over the corresponding "lazy" version??

mhawke 2008-10-29 06:05:51

because, as he said, the old behaviour of zip and range will go away soon.

hop 2008-10-29 10:01:36

@Steven: Good point. I'd forgotten about these changes in 3.0, which probably means that someone up there is convinced of their general superiority. Re: List comprehensions, they are often clearer (and faster than expanded `for` loops!), but one can easily write incomprehensible list comprehensions.

David Eyk 2008-10-29 15:54:32

I meant that list(frob(x) for x in foo) is more descriptive than [frob(x) for x in foo] -- i.e. the [] list comprehension "sugar" is not helpful.

Steven Huwig 2008-10-29 17:45:58

I see what you mean, but I find the `[]` form descriptive enough (and more concise, and less cluttered, generally). But this is just a matter of taste.

David Eyk 2008-10-30 15:57:13

And it looks like this will be the official answer, mainly for the point about generators becoming the normal forms in 3.0. Nobody's brought up any serious detriments to the careful use of generators, even on short datasets, so I will continue to use them with abandon.

David Eyk 2008-10-30 15:59:33

Please check my response with performance numbers below. List comprehensions can be significantly faster than generator expressions when using psyco.

Ryan Ginstrom 2008-11-01 06:34:51

Answer 2

+8 A:

In general, don't use a generator when you need list operations, like len(), reversed(), and so on.

There may also be times when you don't want lazy evaluation (e.g. to do all the calculation up front so you can release a resource). In that case, a list expression might be better.

Ryan Ginstrom 2008-10-29 04:42:05

Answer 3

+1 A:

As you mention, "This especially makes sense for large datasets", I think this answers your question.

If your not hitting any walls, performance-wise, I would stick to lists and standard functions. Then when you run into problems with performance make the switch.

monkut 2008-10-29 08:50:47

+1 Generators makes your code more ready for big datasets without you having to anticipate it.

kaizer.se 2009-10-13 18:38:29

Answer 4

+4 A:

Profile, Profile, Profile.

Profiling your code is the only way to know if what you're doing has any effect at all.

Most usages of xrange, generators, etc are over static size, small datasets. It's only when you get to large datasets that it really makes a difference. range() vs. xrange() is mostly just a matter of making the code look a tiny little bit more ugly, and not losing anything, and maybe gaining something.

Profile, Profile, Profile.

Jerub 2008-10-29 11:37:31

Profile, indeed. One of these days, I'll try and do an empirical comparison. Until then, I was just hoping someone else already had. :)

David Eyk 2008-10-29 16:01:58

Answer 5

+2 A:

As far as performance is concerned, I can't think of any times that you would want to use a list over a generator.

Jason Baker 2008-10-29 11:44:06

Answer 6

A:

I've never found a situation where generators would hinder what you're trying to do. There are, however, plenty of instances where using generators would not help you any more than not using them.

For example:

sorted(xrange(5))

Does not offer any improvement over:

sorted(range(5))

Jeremy Cantrell 2008-10-29 16:44:36

Answer 7

+2 A:

Regarding performance: if using psyco, lists can be quite a bit faster than generators. In the example below, lists are almost 50% faster when using psyco.full()

import psyco
import time
import cStringIO

def time_func(func):
    """The amount of time it requires func to run"""
    start = time.clock()
    func()
    return time.clock() - start

def fizzbuzz(num):
    """That algorithm we all know and love"""
    if not num % 3 and not num % 5:
        return "%d fizz buzz" % num
    elif not num % 3:
        return "%d fizz" % num
    elif not num % 5:
        return "%d buzz" % num
    return None

def with_list(num):
    """Try getting fizzbuzz with a list comprehension and range"""
    out = cStringIO.StringIO()
    for fibby in [fizzbuzz(x) for x in range(1, num) if fizzbuzz(x)]:
        print >> out, fibby
    return out.getvalue()

def with_genx(num):
    """Try getting fizzbuzz with generator expression and xrange"""
    out = cStringIO.StringIO()
    for fibby in (fizzbuzz(x) for x in xrange(1, num) if fizzbuzz(x)):
        print >> out, fibby
    return out.getvalue()

def main():
    """
    Test speed of generator expressions versus list comprehensions,
    with and without psyco.
    """

    #our variables
    nums = [10000, 100000]
    funcs = [with_list, with_genx]

    # first, try without psyco
    print "without psyco"
    for num in nums:
        print "  number:", num
        for func in funcs:
            print func.__name__, time_func(lambda : func(num)), "seconds"
        print

    # now with pscyo
    print "with psyco"
    psyco.full()
    for num in nums:
        print "  number:", num
        for func in funcs:
            print func.__name__, time_func(lambda : func(num)), "seconds"
        print

if __name__ == "__main__":
    main()

Results:

without psyco
  number: 10000
with_list 0.0519102208309 seconds
with_genx 0.0535933367509 seconds

  number: 100000
with_list 0.542204280744 seconds
with_genx 0.557837353115 seconds

with psyco
  number: 10000
with_list 0.0286369007033 seconds
with_genx 0.0513424889137 seconds

  number: 100000
with_list 0.335414877839 seconds
with_genx 0.580363490491 seconds

Ryan Ginstrom 2008-11-01 05:53:31

That's because psyco doesn't speed up generators at all, so it's more of a shortcoming of psyco than of generators. Good answer, though.

Steven Huwig 2008-11-03 20:10:57

Also, psyco is pretty much unmaintained now. All the developers are spending time on PyPy's JIT which does to the best of my knowledge optimise generators.

Noufal Ibrahim 2009-12-31 16:55:37

Answer 8

+1 A:

You should prefer list comprehensions if you need to keep the values around for something else later and the size of your set is not too large.

For example: you are creating a list that you will loop over several times later in your program.

To some extent you can think of generators as a replacement for iteration (loops) vs. list comprehensions as a type of data structure initialization. If you want to keep the data structure then use list comprehensions.

minty 2008-11-01 23:49:09

If you only need limited look-ahead / look-behind on the stream, then maybe `itertools.tee()` can help you. But generally, if you want more than one pass, or random access to some intermediate data, make a list/set/dict of it.

Beni Cherniavsky-Paskin 2009-12-31 13:33:35

ansaurus

tags:

views:

answers:

When is not a good time to use python generators?

related questions