views:

1626

answers:

11

I have a Python script which takes as input a list of integers, which I need to work with four integers at a time. Unfortunately, I don't have control of the input, or I'd have it passed in as a list of four-element tuples. Currently, I'm iterating over it this way:

for i in xrange(0, len(ints), 4):
    # dummy op for example code
    foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]

It looks a lot like "C-think", though, which makes me suspect there's a more pythonic way of dealing with this situation. The list is discarded after iterating, so it needn't be preserved. Perhaps something like this would be better?

while ints:
    foo += ints[0] * ints[1] + ints[2] * ints[3]
    ints[0:4] = []

Still doesn't quite "feel" right, though. :-/

Related question: How do you split a list into evenly sized chunks in Python?

A: 

There doesn't seem to be a pretty way to do this. Here is a page that has a number of methods, including:

def split_seq(seq, size):
    newseq = []
    splitsize = 1.0/size*len(seq)
    for i in range(size):
        newseq.append(seq[int(round(i*splitsize)):int(round((i+1)*splitsize))])
    return newseq
Harley
A: 

In your second method, I would advance to the next group of 4 by doing this:

ints = ints[4:]

However, I haven't done any performance measurement so I don't know which one might be more efficient.

Having said that, I would usually choose the first method. It's not pretty, but that's often a consequence of interfacing with the outside world.

Greg Hewgill
+5  A: 
import itertools
def chunks(iterable,size):
    it = iter(iterable)
    chunk = tuple(itertools.islice(it,size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# though this will throw ValueError if the length of ints
# isn't a multiple of four:
for x1,x2,x3,x4 in chunks(ints,4):
    foo += x1 + x2 + x3 + x4

for chunk in chunks(ints,4):
    foo += sum(chunk)

Another way:

import itertools
def chunks2(iterable,size,filler=None):
    it = itertools.chain(iterable,itertools.repeat(filler,size-1))
    chunk = tuple(itertools.islice(it,size))
    while len(chunk) == size:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# x2, x3 and x4 could get the value 0 if the length is not
# a multiple of 4.
for x1,x2,x3,x4 in chunks2(ints,4,0):
    foo += x1 + x2 + x3 + x4
MizardX
+1 for using generators, seams like the most "pythonic" out of all suggested solutions
It's rather long and clumsy for something so easy, which isn't very pythonic at all. I prefer S. Lott's version
zenazn
+11  A: 

I'm a fan of

chunkSize= 4
for i in xrange(0, len(ints), chunkSize):
    chunk = ints[i:i+chunkSize]
    # process chunk of size <= chunkSize
S.Lott
+27  A: 
def chunker(seq, size):
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

Simple. Easy. Fast. Works with any sequence:

text = "I am a very, very helpful text"

for group in chunker(text, 7):
   print repr(group),
# 'I am a ' 'very, v' 'ery hel' 'pful te' 'xt'

print '|'.join(chunker(text, 10))
# I am a ver|y, very he|lpful text

animals = ['cat', 'dog', 'rabbit', 'duck', 'bird', 'cow', 'gnu', 'fish']

for group in chunker(animals, 3):
    print group
# ['cat', 'dog', 'rabbit']
# ['duck', 'bird', 'cow']
# ['gnu', 'fish']
nosklo
Clear and compact. Very pythonic. :-)
Ben Blank
@Carlos Crasborn's version works for any iterable (not just sequences as the above code); it is concise and probably just as fast or even faster. Though it might be a bit obscure (unclear) for people unfamiliar with `itertools` module.
J.F. Sebastian
@J.F. Sebastian — Now that I've gotten the chance to figure out *why* his code works, I feel compelled to change my accepted answer (which I *hate* doing). I love this answer, too, @nosklo, but that izip_longest trick seems tailor-made for my situation.
Ben Blank
Agreed. This is the most generic and pythonic way. Clear and concise. (and works on app engine)
Matt Williamson
A: 

If the list is large, the highest-performing way to do this will be to use a generator:

def get_chunk(iterable, chunk_size):
    result = []
    for item in iterable:
        result.append(item)
        if len(result) == chunk_size:
            yield tuple(result)
            result = []
    if len(result) > 0:
        yield tuple(result)

for x in get_chunk([1,2,3,4,5,6,7,8,9,10], 3):
    print x

(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10,)
Robert Rossney
(I think that MizardX's itertools suggestion is functionally equivalent to this.)
Robert Rossney
(Actually, on reflection, no I don't. itertools.islice returns an iterator, but it doesn't use an existing one.)
Robert Rossney
A: 

If the lists are the same size, you can combine them into lists of 4-tuples with zip(). For example:

# Four lists of four elements each.

l1 = range(0, 4)
l2 = range(4, 8)
l3 = range(8, 12)
l4 = range(12, 16)

for i1, i2, i3, i4 in zip(l1, l2, l3, l4):
    ...

Here's what the zip() function produces:

>>> print l1
[0, 1, 2, 3]
>>> print l2
[4, 5, 6, 7]
>>> print l3
[8, 9, 10, 11]
>>> print l4
[12, 13, 14, 15]
>>> print zip(l1, l2, l3, l4)
[(0, 4, 8, 12), (1, 5, 9, 13), (2, 6, 10, 14), (3, 7, 11, 15)]

If the lists are large, and you don't want to combine them into a bigger list, use itertools.izip(), which produces an iterator, rather than a list.

from itertools import izip

for i1, i2, i3, i4 in izip(l1, l2, l3, l4):
    ...
Brian Clapper
+4  A: 
from itertools import izip_longest

def chunker(iterable, chunksize, filler):
    return izip_longest(*[iter(iterable)]*chunksize, fillvalue=filler)
Pedro Henriques
+1 iterators and conciseness.
MizardX
A readable way to do it is http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks#434411
J.F. Sebastian
I've removed spaces around '=' in the arguments list (see PEP8).
J.F. Sebastian
+13  A: 

Modified from the recipes section of Python's itertools docs:

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return izip_longest(*args, fillvalue=fillvalue)

Example
In pesudocode to keep the example terse.

grouper('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx'

Note: izip_longest is new to Python 2.6

Craz
I know it is taken literally from documentation but I'd change the order of parameters: `grouper(iterable, chunksize)` and `izip_longest(*args, fillvalue=fillvalue)`
J.F. Sebastian
Very nice! Probably the most compact method here, considering it even combines chunking and padding. Unfortunately, it's pretty opaque. Even having read up on izip_longest, I'm still not sure how this works. :-/
Ben Blank
@J.F. Sebastian: Thanks. That does follow common convention.
Craz
Finally got a chance to play around with this in a python session. For those who are as confused as I was, this is feeding the same iterator to izip_longest multiple times, causing it to consume successive values of the same sequence rather than striped values from separate sequences. I love it!
Ben Blank
What's the best way to filter back out the fillvalue? ([item for item in items if item is not fillvalue] for items in grouper(iterable))?
gotgenes
+1  A: 

Since nobody's mentioned it yet here's a zip() solution:

>>> def chunker(iterable, chunksize):
...     return zip(*[iter(iterable)]*chunksize)

It works only if your sequence's length is always divisible by the chunk size or you don't care about a trailing chunk if it isn't.

Example:

>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9')]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8')]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

Or using itertools.izip to return an iterator instead of a list:

>>> from itertools import izip
>>> def chunker(iterable, chunksize):
...     return izip(*[iter(iterable)]*chunksize)

Padding can be fixed using @ΤΖΩΤΖΙΟΥ's answer:

>>> from itertools import chain, izip, repeat
>>> def chunker(iterable, chunksize, fillvalue=None):
...     it   = chain(iterable, repeat(fillvalue, chunksize-1))
...     args = [it] * chunksize
...     return izip(*args)
J.F. Sebastian
A: 

Using itertools and iter, this simple function works both for sequences (tuples, lists) and iterables (no padding):

def grouper(n, it):
  """grouper(3, 'ABCDEFG') --> ABC DEF G"""
  return iter(lambda: list(itertools.islice(it, n)), [])

>>> list(grouper(2, iter([1,2,3,4,5])))
[[1,2], [3,4], [5]]
tokland