ansaurus

Question

How to tell the difference between an iterator and an iterable?

Answer 1

+7 A:

" However, there is an important semantic difference between the two..."

Not really semantic or important. They're both iterable -- they both work with a for statement.

"The difference is for example important when one wants to loop multiple times."

When does this ever come up? You'll have to be more specific. In the rare cases when you need to make two passes through an iterable collection, there are often better algorithms.

For example, let's say you're processing a list. You can iterate through a list all you want. Why did you get tangled up with an iterator instead of the iterable? Okay that didn't work.

Okay, here's one. You're reading a file in two passes, and you need to know how to reset the iterable. In this case, it's a file, and seek is required; or a close and a reopen. That feels icky. You can readlines to get a list which allows two passes with no complexity. So that's not necessary.

Wait, what if we have a file so big we can't read it all into memory? And, for obscure reasons, we can't seek, either. What then?

Now, we're down to the nitty-gritty of two passes. On the first pass, we accumulated something. An index or a summary or something. An index has all the file's data. A summary, often, is a restructuring of the data. With a small change from "summary" to "restructure", we've preserved the file's data in the new structure. In both cases, we don't need the file -- we can use the index or the summary.

All "two-pass" algorithms can be changed to one pass of the original iterator or iterable and a second pass of a different data structure.

This is neither LYBL or EAFP. This is algorithm design. You don't need to reset an iterator -- YAGNI.

Edit

Here's an example of an iterator/iterable issue. It's simply a poorly-designed algorithm.

it = iter(xrange(3))
for i in it: print i,; #prints 1,2,3 
for i in it: print i,; #prints nothing

This is trivially fixed.

it = range(3)
for i in it: print i
for i in it: print i

The "multiple times in parallel" is trivially fixed. Write an API that requires an iterable. And when someone refuses to read the API documentation or refuses to follow it after having read it, their stuff breaks. As it should.

The "nice to safeguard against the case were a user provides only an iterator when multiple passes are needed" are both examples of insane people writing code that breaks our simple API.

If someone is insane enough to read most (but not all of the API doc) and provide an iterator when an iterable was required, you need to find this person and teach them (1) how to read all the API documentation and (2) follow the API documentation.

The "safeguard" issue isn't very realistic. These crazy programmers are remarkably rare. And in the few cases when it does arise, you know who they are and can help them.

Edit 2

The "we have to read the same structure multiple times" algorithms are a fundamental problem.

Do not do this.

for element in someBigIterable:
    function1( element )
for element in someBigIterable:
    function2( element )
...

Do this, instead.

for element in someBigIterable:
    function1( element )
    function2( element )
    ...

Or, consider something like this.

for element in someBigIterable:
    for f in ( function1, function2, function3, ... ):
        f( element )

In most cases, this kind of "pivot" of your algorithms results in a program that might be easier to optimize and might be a net improvement in performance.

S.Lott 2009-04-02 10:09:59

What about multiple times in parallel? E.g. several threads iterating over the same collection? Or even one thread, such as an easily-imagined naive implementation of "does this collection have the same element twice?".

Edmund 2009-04-02 10:16:15

Thanks, I added an explanation to the question. You have a valid point, but in my case I belief this does not work.

nikow 2009-04-02 10:22:32

"remarkably rare". I'd disagree, programmers that can't tell iterable from iterator are not by any means rare."you know who they are and can help them." That's usually not your job, and in corporation "helping them" would not be very well perceived, especially if it's another department.

vartec 2009-04-02 11:10:15

@vartec: it's your application/library/framework, you need to support it. Helping the crazy programmers who refuse to read the API and can't figure out why it broke when they didn't follow the rules is support as I understand it. It *is* well perceived in my experience.

S.Lott 2009-04-02 11:15:31

@S.Lott: In a perfect world you're right. In corporative politics saying, that the other department's code isn't correct results in conflict. If the other dept. has more political influence, your help will be perceived as "trying to cover incompetence". And it doesn't mater if your right or wrong.

vartec 2009-04-02 11:22:42

+1 for your edit. I was teached, this is programming by contract. If one party doesn't comply, the other party doesn't need to comply, too.

unbeknown 2009-04-02 11:29:22

@S.Lott: In this context, may I call upon your attention to this question: http://stackoverflow.com/questions/701088/py3k-memory-conservation-by-returning-iterators-rather-than-lists Thanks!

Lakshman Prasad 2009-04-02 11:53:19

@becomingGuru: preoccupation with memory management can become silly. My point is that many "2-pass" algorithms do significant data reduction on the first pass; the second pass is not necessary because it's working on a smaller data structure.

S.Lott 2009-04-02 12:01:59

@vartec: if your organization's corporate politics are so dysfunctional that help == conflict, you should find a better organization to work for. Writing useless code to work around organizational problems is an epic fail waiting to happen.

S.Lott 2009-04-02 12:07:57

@heikogerlach: more importantly, if one party won't comply, the other party can't coerce compliance. If they won't comply, they wrote the bug; you can't fix their refusal to comply.

S.Lott 2009-04-02 12:43:55

@S.Lott: I already did. But as far as I know it's pretty much the same in most big corporations. Big bureaucracies are always inefficient.

vartec 2009-04-02 21:07:47

@vartec: Over the last 30 years, I've never worked at a place where helping someone become conflict or was perceived badly. An API contract has never been a problem in 100's of locations. Convoluted code to help the crazy programmers who can't follow the API is -- simply -- bad.

S.Lott 2009-04-02 21:15:00

+1 for addressing the question of 2 passes over a large data at such length. I agree - if it seems like you need to iterate over the exact same, unchanged, entire data twice, there's a design issue that needs to be addressed.

Jarret Hardie 2009-04-02 22:44:55

Answer 2

+10 A:

'iterator' if obj is iter(obj) else 'iterable'

vartec 2009-04-02 10:11:29

Wow, this seems to be the answer that I have been looking for, thanks! I will wait a little before accepting it, in case somebody can point out a problem with this.

nikow 2009-04-02 10:24:40

Well, the problem is one "wasted" call to obj.__iter__(), but I don't see other reliable way to do it.

vartec 2009-04-02 10:28:43

Although I don't know a counter-example, this is not *guaranteed* to work.

ΤΖΩΤΖΙΟΥ 2009-04-02 23:26:26

@ΤΖΩΤΖΙΟΥ: well you could imagine objects, that doesn't have .next(), but has __iter__(self) = lambda x: x

vartec 2009-04-03 07:37:07

@ΤΖΩΤΖΙΟΥ: but then again, what would be point of such object?

vartec 2009-04-03 07:39:13

I never said anything about objects not having .next. Your premise is `iter(obj) is obj`, which AFAIK is true, but it's not guaranteed.

ΤΖΩΤΖΙΟΥ 2009-04-03 19:45:18

Answer 3

A:

Because of Python's duck typing,

Any object is iterable if it defines the next() and __iter__() method returns itself.

If the object itself doesnt have the next() method, the __iter__() can return any object, that has a next() method

You could refer this question to see Iterability in Python

Lakshman Prasad 2009-04-02 11:50:17

Try this: class A(object): def __iter__(self): return iter([1,2,3]) def next(self): yield 7

vartec 2009-04-02 12:04:55

Actually this is a problem of duck typing: it can hide a semantic / conceptual difference. It allows us to write for i in range(3) instead of for i in iter(range(3)), but can cause subtle problems.

nikow 2009-04-02 12:05:38

Sorry, I did not exactly get the point? Something wrong?

Lakshman Prasad 2009-04-02 12:49:50

Answer 4

+1 A:

import itertools

def process(iterable):
    work_iter, backup_iter= itertools.tee(iterable)

    for item in work_iter:
        # bla bla
        if need_to_startover():
            for another_item in backup_iter:

That damn time machine that Raymond borrowed from Guido…

ΤΖΩΤΖΙΟΥ 2009-04-02 23:24:19

ansaurus

tags:

views:

answers:

How to tell the difference between an iterator and an iterable?

related questions