ansaurus

Question

Optimal way to access a value from the last iteration in a loop

Answer 1

+3 A:

Your code is going to be doing the "if not prev" test every time round the loop, even though it only applies to the first element. Also your code seems broken to me - the first time round the loop the prev and current values are the same.

I would do it like this, assuming that there is at least one element:

tmp_iter = iter(tmp)
prev = tmp_iter.next()

for x in tmp_iter: 
   print 'seq: ', x[1], 'prev seq:', prev[1], 'variance: ', x[1]-prev[1]
   prev = x

this could be optimised further by getting rid of the indexing:

tmp_iter = iter(tmp)
[_, prev] = tmp_iter.next()

for [_, x] in tmp_iter: 
   print 'seq: ', x, 'prev seq:', prev, 'variance: ', x-prev
   prev = x

I use the assignment to spit the list into its constituent parts, and assign the first element to _ because it is not used.

Dave Kirby 2009-12-09 20:28:07

Dave thanks for the quick response, actually the code is correct, On the first iteration the variance should actually be 0, i.e nothing before it. Further optimizations such as?

izzy 2009-12-09 20:37:03

Issy, the code is already ok.

culebrón 2009-12-09 20:49:40

the second option here seems the fastest will update my question with avg results

izzy 2009-12-09 22:01:51

Answer 2

A:

This code generates NameError because at if not prev, prev is not defined. Set it to False or None before the cycle. Also you may make a different loop:

for i in xrange(1, len(tmp)):
    print 'seq: {0}, prev seq: {1}, variance: {2}'.format(tmp[i][1], tmp[i - 1][1], tmp[i] - tmp[i - 1][1])

If you'll use 100,000+ records, the bottleneck will be not the cycle, but the memory used by the app. Don't store all the data in such a format: each pair of values (a list) will eat 100+ bytes. If they're in a file, it's better to iterate over it's lines:

(assuming the data is tab-separated)

def reader(filename):
    with open(filename) as f:
        prev = f.next()
        for l in f:
            l = l.split('\t')
            yield (prev, l)
            prev = l

for (prev, curr) in reader(myfile):
    print 'seq: {0}, prev seq: {1}, variance: {2}'.format(curr[1], prev[1], curr[1] - prev[1])

reader is a generator, it returns values from a sequence many times. This way, only 2 lines of data will be stored in memory at any moment, and your app will sustain even millions of rows.

To make the code readable, I put it aside, so that in the program body we dealed with the sequence of data, without caring how it's composed.

culebrón 2009-12-09 20:32:27

Hi culebron, yes in my code i do actually set prev = None (but did not include it here)

izzy 2009-12-09 20:38:39

Answer 3

+4 A:

Just iterate over pairs, using zip(), which is much more readable.

UPDATE: for python 2.x, use itertools.izip instead as it is more efficient!

from itertools import izip
for prev, next in izip(tmp, tmp[1:]):
    print 'seq: ', next[1], 'prev seq:', prev[1], 'variance: ', next[1]-prev[1]

which can also use value unpacking to avoid the index:

for (_, prev), (_, next) in izip(tmp, tmp[1:]):
    print 'seq: ', next, 'prev seq:', prev, 'variance: ', next-prev

Or, if you really need the first iteration too

for prev, next in izip(tmp, tmp[:1] + tmp):
    print 'seq: ', next[1], 'prev seq:', prev[1], 'variance: ', next[1]-prev[1]

EDIT

If you want to avoid the creation of a list in the second argument you can also use an explicit iterator:

itr = iter(tmp)
itr.next() # here I assume tmp is not empty, otherwise an exception will be thrown
for prev, next in izip(tmp, itr):
    print 'seq: ', next[1], 'prev seq:', prev[1], 'variance: ', next[1]-prev[1]

Note: This zip pattern is useful in similar problems too. For example to extract successive triplets from a list:

xs = range(9)
triplets = zip(xs[::3], xs[1::3], xs[2::3]) # python 2.x, zip returns a list

print xs       # [0, 1, 2, 3, 4, 5, 6, 7, 8]
print triplets # [(0, 1, 2), (3, 4, 5), (6, 7, 8)]

Also note that in python 3 zip returns an iterator, similar to itertools.izip.

catchmeifyoutry 2009-12-09 20:41:53

thanks for the response, just tested this (Your edited code) and it seems the slowest of all the options (even slower then my original code above)

izzy 2009-12-09 22:01:09

As noted, it might be because zip builds a full list in memory in python 2.x. Anyway, in that case you should use an explicit loop. Too bad, IMHO this is the optimal solution (optimal in the "desired pythonic way" sense).Good luck!

catchmeifyoutry 2009-12-09 22:08:45

SCRAP THAT, python 2.x has `itertools.izip` :p, please time again

catchmeifyoutry 2009-12-09 22:17:25

Answer 4

A:

it = imap(operator.itemgetter(1), tmp) # get all 2nd items
prev = next(it, None) # get 1st element (doesn't throw exception for empty `tmp`)
for x in it:
    print 'seq: %s prev seq: %s variance: %s' % (x, prev, x-prev)
    prev = x

J.F. Sebastian 2009-12-09 22:42:55

If I might return the favour: `for prev in it: break` is written `next(it, None)` nowadays :)

ΤΖΩΤΖΙΟΥ 2009-12-10 00:35:07

@ΤΖΩΤΖΙΟΥ: Thanks. Since Python 2.6 `next(it, None)` is the way.

J.F. Sebastian 2009-12-10 18:18:08

Answer 5

+2 A:

Using itertools:

from itertools import izip, islice
for prev, cur in izip(l, islice(l, 1, None)):
    print 'seq:', cur[1], 'prev seq:', prev[1], 'delta:', cur[1]-prev[1]

For the specific example given in the question, note that, if the numbers can be represented using 32-bit ints, and the list of numbers fits into memory, one of the fastest ways to compute the difference would be to use numpy:

import numpy
a = numpy.array([x[1] for x in tmp])
delta = numpy.diff(a)

Xavier Martinez-Hidalgo 2009-12-10 00:27:53

Answer 6

+1 A:

Guido's time machine to the rescue!

From the itertools recipes page:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

This should be the most appropriate method (consider the iterable was (random.randint(100) for x in xrange(1000)); here iter(iterable); next(iterable) as a secondary iterator might not provide correct functionality.

Use it in your loop as:

for prev_item, item in pairwise(iterable):
    …

ΤΖΩΤΖΙΟΥ 2009-12-10 00:32:20

ansaurus

tags:

views:

answers:

Optimal way to access a value from the last iteration in a loop

related questions