views:

145

answers:

2

This simple statement:

zip(xrange(0, 11614321), xrange(0, 11627964))

...is eating most of my RAM. (>150 MiB!) Why?

Edit: Ah, re-reading the docs, I see zip returns a list, not an iterable. Anything like zip that returns an iterable?


The larger picture: I'm iterating over two large arrays of file data, and I'm doing things like iterating (0-end, 0-end), (0-end, 1-end), etc. I'd like to not slice the array, as it would cause excessive allocations of memory. I figured I'd just iterate over the indexes instead, but that doesn't seem to work, as per above. The whole code:

def subsequence_length(data_a, data_b, loc_a, loc_b):
    length = 0
    for i_a, i_b in zip(xrange(loc_a, len(data_a)), xrange(loc_b, len(data_b))):
        if data_a[i_a] == data_b[i_b]:
            length += 1
        else:
            break
    return length
+11  A: 

Use izip from itertools

nos
Exactly what I need. Thanks!
Thanatos
+3  A: 

If for some reason you didn't want to use the itertools module, it would be trivial to write your own iterator that did the same thing, at least if you know you're dealing with exactly two input iterators.

def xzip2(i1, i2):
    i1, i2 = iter(i1), iter(i2)
    while True:
        yield next(i1), next(i2)

Actually, upon further reflection, it is not that hard to make it work with any number of iterators. I am fairly sure itertools.izip must be implemented something like this.

def xzip(*iters):
    iters = [iter(i) for i in iters]
    while True:
        yield tuple([next(i) for i in iters])

For extra special fun, consider why a list comprehension is needed in the yield statement. Why can't I just do yield tuple(next(i) for i in iters)? Because it causes the xzip iterator to never terminate. Why? Because the tuple constructor eats the StopIteration exception caused by running out of items in one of the input iterators rather than allowing it to be passed on to the iterator machinery. Why? It is expecting the same exception at the end of constructing the tuple, and when it gets it, it assumes it has reached the end of the comprehension, not the end of one of the input iterators. So it never figures out it has reached the end of any of the iterators, it just keeps going with a shorter and shorter result until it is yielding an empty tuple each time. By using the list comprehension, iterations raised during the construction of the result list are passed on, and only if the list comprehension succeeds do we convert it to a tuple. And we're only converting it to a tuple because that's what the regular zip returns. Told you it was fun.

kindall
You should use the builtin `next` function (`next(i1)`), the method has been renamed `__next__` in Python 3.
adw
And you should do `i1 = iter(i1); i2 = iter(i2)` before looping. `xrange` is iterable, but not itself an iterator.
adw
Good suggestions both. Added.
kindall