views:

872

answers:

7

What (if any) performance advantages are offered by using iterators. It seems like the 'Right Way' to solve many problems, but does it create faster/more memory-conscious code? I'm thinking specifically in Python, but don't restrict answers to just that.

+2  A: 

Iterators are just classes that implement a particular interface, specifically an interface for going to the next one. In Python, lists, tuples, dicts, strings, and files all implement this interface. If they are implemented poorly, it may result in poor performance, but there is nothing inherent to the interface that implies good or bad performance.

Ryan Bright
What you're saying is technically true to a point. However, I disagree that speed is a result of the *quality* of the underlying data structure. It depends more on whether the data structure is the right one for the task or if one is really even required.
Jason Baker
My point is that none of this has to do with iterators as asked in the question. With an iterator, you call next() until StopIteration is raised. What that next() is doing is where your performance metric is. In the end, the accepted answer is about generators, not iterators so I guess it's moot.
Ryan Bright
+1  A: 

An iterator is simply an object that provides methods to allow traversing through a collection. You could traverse all of the elements of an array or all the nodes of a tree with the same interface. Trees and arrays are very different data structures and require different methods to traverse .. but with an iterator you can loop through all elements in the same way.

For one type of collection, there may also be different ways to traverse it and a single collection could have multiple iterators .. You could have a depth-first iterator or a breadth-first iterator traversing a tree structure and returning the nodes in different orders. Iterators are not intended for performance ... but typically for providing a consistent interface for traversing structures.

markt
+3  A: 

The primary benefit of iterators is not one of performance. In my experience the most performant solution is creating an algorithm which embeds your data structure of choice. The benefit of iterators is that they allow you to decouple data and algorithm and, therefore, generalize and reuse both. If this can also be done without (or with little) performance degradation then it's a net gain.

My favorite example of iterator usage can be found in the C++ Standard Template Library. It manages to demonstrate the power and beauty of the abstraction by cleanly separating container and algorithm without sacrificing performance. Understanding this design had a profound effect on the way I think about code.

Waylon Flinn
+10  A: 

For Python, generators will be faster and have better memory efficiency. Just think of an example of range(1000) vs xrange(1000) (This has been changed in 3.0, range is now an generator). With Range you pre-build your list but XRange just has a generator object and yields the next item when needed instead.

The performance difference isn't great on small things, but as soon as you start cranking them out getting larger and larger sets of information you'll notice it quite quickly. Also, not just having to generate and then step through, you will be consuming extra memory for your pre-built item where-as with the generator expression only 1 item at a time gets made.

Christian Witts
+4  A: 

There's actually a very good mail on the python mailing list about this: Iterators vs Lists. It's a bit dated (from 2003), but as far as I know, it's still valid.

Here's the summary:

For small datasets, iterator and list based approaches have similar performance. For larger datasets, iterators save both time and space.

What I would draw from it is this: iterators are to be preferred over loading data into a list if possible. But unless you have a big dataset, don't contort your code to make something that should fit in a list to work with an iterator.

Jason Baker
+2  A: 

To backup the @Christian Witts's answer:

range vs. xrange performance

python25 -mtimeit "for i in xrange(1000): pass"
10000 loops, best of 3: 56.3 usec per loop

python25 -mtimeit "for i in range(1000): pass"
10000 loops, best of 3: 80.9 usec per loop

python26 -mtimeit "for i in xrange(1000): pass"
10000 loops, best of 3: 48.8 usec per loop

python26 -mtimeit "for i in range(1000): pass"
10000 loops, best of 3: 68.6 usec per loop

btw, neither range() nor xrange() are iterators:

>>> hasattr(range(1), 'next')
False
>>> hasattr(xrange(1), 'next')
False
>>> iter(xrange(1))
<rangeiterator object at 0x0097A500>
>>> iter(range(1))
<listiterator object at 0x00A7BFD0>
>>> iter([])
<listiterator object at 0x00A7BE30>
>>> iter(i for i in (1,))
<generator object at 0x00A7F940>
>>> (i for i in (1,))
<generator object at 0x00A7FDC8>
J.F. Sebastian
btw, answer for python30 is 31.5 usec, doesn't really fit into your comparison, but good to know, i reckon
SilentGhost
@SilentGhost: there is no `xrange` in Python 3.x therefore nothing to compare to.
J.F. Sebastian
@SilentGhost: Also, unless you have access to J.F. Sebastian's computer, the comparison is not very useful..
John Fouhy
should be noted that the times are microseconds... there are probably better places in your code to spend your time optimising (like the database access)
Jiaaro
@Jim: 1. The OP *does* ask about *performance* advantages. 2. *Measure* first, optimize second (don't guess that it is the database access, prove it and only then optimize it).
J.F. Sebastian
+1  A: 

My inference from many answers above is "Use list to code. If necessary, re-factor using iterators" The difference is not apparent unless you have a large dataset.

Another thing to note is that, even when using lists often, the dataset we are operating upon is progressively smaller and smaller.

Lakshman Prasad