views:

74

answers:

4

(In Python 3.1) (Somewhat related to another question I asked, but this question is about iterators being exhausted.)

# trying to see the ratio of the max and min element in a container c
filtered = filter(lambda x : x is not None and x != 0, c)
ratio = max(filtered) / min(filtered)

It took me half hour to realize what the problem is (the iterator returned by filter is exhausted by the time it gets to the second function call). How do I rewrite it in the most Pythonic / canonical way?

Also, what can I do to avoid bugs of this sort, besides getting more experience? (Frankly, I don't like this language feature, since these types of bugs are easy to make and hard to catch.)

+1  A: 

The entity filtered is essentially an object with state. Of course, now it's obivous that running max or min on it will change that state. In order to stop tripping over that, I like to make it absolutely clear (to myself, really) that I am constructing something rather than just transforming something:

Adding an extra step can really help:

def filtered(container):
    return filter(lambda x : x is not None and x != 0, container)

ratio = max(filtered(c)) / min(filtered(c))

Whether you put filtered(...) inside some function (maybe it's not really needed for anything else) or define it as a module-level function is up to you, but in this case I'd suggest that if filtered (the iterator) was only needed in the function, leave it there until you need it elsewhere.

The other thing you can do is to construct a list from it, which will evaluate the iterator:

filtered_iter = filter(lambda x : x is not None and x != 0, container)
filtered = list(filtered_iter)

ratio = max(filtered) / min(filtered)

(Of course, you can just say filtered = list(filter(...)).)

detly
+1 : I'll use this technique, though I still dislike the whole "iterators are iterable" idea... feels like a bug waiting to happen.
max
@max - you can always go `list(...)` on an iterable to force it to be evaluated. I'll add something to my answer.
detly
@max - more generally though... what else would an iterator do, than iterate? Storing the whole thing in memory might be impractical or impossible (eg. indefinite prime generator), and rewinding is also only possible in all cases if you store iterator results somehow. Read the docs for the function you're using, and convert to a list if you need to. Or maybe I've missed your point?
detly
No, you're right. I was just not thinking clearly.
max
+3  A: 

you can convert an iterator to a tuple simply by calling tuple(iterator)

however I'd rewrite that filter as a list comprehension, which would look something like this

# original
filtered = filter(lambda x : x is not None and x != 0, c)

# list comp
filtered = [x for x in c if x is not None and x != 0]
lunixbochs
+1 Yup - just what I was thinking myself. Makes it trivial to check if `filtered` is empty too: `len(filtered) != 0`
max
You don't need to check the length of a list to see if it is empty - it will directly evaluate to False if it is empty, and True otherwise.
lunixbochs
See this for clarification: http://docs.python.org/library/stdtypes.html#truth-value-testing
lunixbochs
+3  A: 

The itertools.tee function can help here:

import itertools

f1, f2 = itertools.tee(filtered, 2)
ratio = max(f1) / min(f2)
Greg Hewgill
+1 - never heard of tee until now... Neat, though it's tricky to use (have to make sure I don't reuse filtered after that call, and also that it's not worse performance-wise than building the list).
max
tee will effectively make a complete copy of the filtered list internally, so it is pretty much equivalent to `filtered_list = list(filtered_iter)`. And your concern of trickiness is not an issue with this specific solution, but is inherent in using iterators - *whenever* you have an iterator, you have to make sure not to try consuming it twice. @THC4k's posted answer is more what you should be using - no additional copies of the iterated data, and no need to try "tee"ing iterators, etc.
Paul McGuire
+3  A: 

Actually your code raises an exception that would prevent this problem! So I guess the problem was that you masked the exception?

>>> min([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: min() arg is an empty sequence
>>> min(x for x in ())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: min() arg is an empty sequence

Anyways, you can also write a new function to give you the min and max at the same time:

def minmax( seq ):
    " returns the `(min, max)` of sequence `seq`"
    it = iter(seq)
    try:
        min = max = next(it)
    except StopIteration:
        raise ValueError('arg is an empty sequence')
    for item in it:
        if item < min:
            min = item
        elif item > max:
            max = item
    return min, max
THC4k
Given an iterator, the 2nd best solution is to store the contents in a list, and then read the list as many times as you like. This solution, adjusting your algorithm to do all it needs in just a single scan of the iterator is better, if you can do it.
Paul McGuire
@THC4k well the bug was that I was catching ValueError thinking it was due to the iterator being emtpy to start with; not due to the iterator being exhausted by my own code!
max