views:

9031

answers:

13

I have a list of arbitrary length, and I need to split it up into equal size chunks and operate on it. There are some obvious ways to do this, like keeping a counter and two lists, and when the second list fills up, add it to the first list and empty the second list for the next round of data, but this is potentially extremely expensive.

I was wondering if anyone had a good solution to this for lists of any length, e.g. using generators.

This should work:

l = range(1, 1000)
print chunks(l, 10) -> [ [ 1..10 ], [ 11..20 ], .., [ 991..999 ] ]

I was looking for something useful in itertools but I couldn't find anything obviously useful. Might've missed it, though.

Related question: What is the most “pythonic” way to iterate over a list in chunks?

+42  A: 

Here's a generator that yields the chunks you want:

def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

import pprint
pprint.pprint(list(chunks(range(75), 10)))

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]
Ned Batchelder
I would avoid using xrange if porting to Python 3.0 is considered possible, since xrange was removed from Python 3.0.
atzz
What happens if we can't tell the length of the list? Try this on itertools.repeat([ 1, 2, 3 ]), e.g.
jespern
That's an interesting extension to the question, but the original question clearly asked about operating on a list.
Ned Batchelder
The 2to3 porting program changes all xrange calls to range since in Python 3.0 the functionality of range will be equivalent to that of xrange (i.e. it will return an iterator). So I would avoid using range and use xrange instead.
Tomi Kyöstilä
Excellent answer, and much nicer than what I had come up with.
I82Much
A: 

If you know list size:

def SplitList( list, chunk_size ) :
    return [list[offs:offs+chunk_size] for offs in range(0, len(list), chunk_size)]

If you don't (an iterator):

def IterChunks( sequence, chunk_size ) :
    res = []
    for item in sequence :
        res.append(item)
        if len(res) >= chunk_size :
            yield res
            res = []
    if res : yield res  # yield the last, incomplete, portion

In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).

atzz
+6  A: 

Here is a generator that work on arbitrary iterables:

def split_seq(iterable, size):
  it = iter(iterable)
  item = list(itertools.islice(it, size))
  while item:
    yield item
    item = list(itertools.islice(it, size))

Example:

>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]
MizardX
+2  A: 

heh, one line version

In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))

In [49]: chunk(range(1,100), 10)
Out[49]: 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]
slav0nic
Please, use "def chunk" instead of "chunk = lambda". It works the same. One line. Same features. MUCH easier to the n00bz to read and understand.
S.Lott
+33  A: 

Directly from the Python documentation (recipes for itertools):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

An alternate take, as suggested by J.F.Sebastian:

from itertools import izip_longest

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

I guess Guido's time machine works—worked—will work—will have worked—was working again.

ΤΖΩΤΖΙΟΥ
"Use the libraries, Luke!" :)
Kevin Little
It is `izip_longest(*[iter(iterable)]*n, fillvalue=fillvalue)` nowadays.
J.F. Sebastian
Thanks, J.F. Love your dolls!
ΤΖΩΤΖΙΟΥ
+1  A: 
def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

usage:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq
Corey Goldberg
A: 
(explicit)

def chunk(lst):
    out = []
    for x in xrange(2, len(lst) + 1):
        if not len(lst) % x:
            factor = len(lst) / x
            break
    while lst:
        out.append([lst.pop(0) for x in xrange(factor)])
    return out
J.T. Hurley
A: 
>>> f = lambda x, n, acc=[]: f(x[n:], n, acc+[(x[:n])]) if x else acc
>>> f("Hallo Welt", 3)
['Hal', 'lo ', 'Wel', 't']
>>>

If you are into brackets - I picked up a book on Erlang :)

hc
A: 

I wonder following code.

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

Why izip receive *[chain ... ] * n? Why it makes iterator n times ?

rafael
All chain objects in the list are actually references to the same object.So, calling .next() on one of them affects the state of the rest.If we have [a,b,c], izip constructs each element somewhat like this: [a.next(), b.next(), c.next()], which is equal to: [a.next(), a.next(), a.next()].And 'a' is a chain, so after all the elements of 'a' are exhausted,it will start calling .next() on the repeat object, filling with the pad values, no more than n-1 times.
slack3r
+6  A: 

If you want something super simple:

def chunks(l, n):
    return [l[i:i+n] for i in range(0, len(l), n)]
oremj
A: 

Without calling len() which is good for large lists:

def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]

And this is for iterables:

def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))

The functional flavour of the above:

def isplitter2(l, n):
    return takewhile(lambda x: x,
                     imap(lambda item: list(islice(item, n)),
                          repeat(iter(l))))
Mars
+1  A: 
def chunk(input, size):
        return map(None, *([iter(input)] * size))
Tomasz Wysocki
+2  A: 

Simple yet elegant

l = range(1, 1000)
print [l[x:x+10] for x in xrange(1, len(l), 10)]

or if you prefer:

chunks = lambda l, n: [l[x: x+n] for x in xrange(0, len(l), n)]
chunks(l, 10)
lebenf