



I am currently in a personal learning project where I read in an XML database. I find myself writing functions that gather data and I'm not sure what would be a fast way to return them.

Which is generally faster:

  1. yields, or
  2. several append()s within the function then return the ensuing list?

I would be happy to know in what situations where yields would be faster than append()s or vice-versa.

+7  A: 

yield has the huge advantage of being lazy and speed is usually not the best reason to use it. But if it works in your context, then there is no reason not to use it:

data = range(1000)

def yielding():
    def yielder():
        for d in data:
            yield d
    return list(yielder())

def appending():
    lst = []
    for d in data:
    return lst

This is the result:

python2.7 -m timeit -s "from yield_vs_append import yielding,appending" "yielding()"
10000 loops, best of 3: 80.1 usec per loop

python2.7 -m timeit -s "from yield_vs_append import yielding,appending" "appending()"
10000 loops, best of 3: 130 usec per loop

At least in this very simple test, yield is faster than append.

Does _lazy_ mean _low memory requirement_?
+4  A: 

I recently asked myself a similar question exploring ways of generating all permutations of a list (or tuple) either via appending to a list or via a generator, and found (for permutations of length 9, which take about a second or so to generate):

  • The naive approach (permutations are lists, append to list, return list of lists) takes about three times the time of itertools.permutations
  • Using a generator (ie yield) reduces this by approx. 20 %
  • Using a generator and generating tuples is the fastest, about twice the time of itertools.permutations .

Take with a grain of salt! Timing and profiling was very useful:

if __name__ == '__main__':
    import cProfile"main()")
+3  A: 

There is a even faster alternative to TH4Ck's yielding(). It is list comprehension.

In [245]: def list_comp():
   .....:     return [d for d in data]

In [246]: timeit yielding()
10000 loops, best of 3: 89 us per loop

In [247]: timeit list_comp()
10000 loops, best of 3: 63.4 us per loop

Of course it is rather silly to micro-benchmark these operations without knowing the structure of your code. Each of them are useful in difference situation. For example list comprehension is useful if you want to apply a simple operation that can be express as an single expression. Yield has a significant advantage for you to isolate the traversal code into a generator method. Which one is appropriate depends a lot on the usage.

Wai Yip Tung
I actually wanted to include list comprehensions, but I'm choosing between these two: `[n for n in func_that_yields()]` or `[n for n in func_that_returns_an_iterable()]`. Note that `n` can be a simple element unpack, or a complex element-by-element operation. Anyway, good point you have in there :)