views:

1316

answers:

12

How do I process the elements of a sequence in batches, idiomatically?

For example, with the sequence "abcdef" and a batch size of 2, I would like to do something like the following:

for x, y in "abcdef":
    print "%s%s\n" % (x, y)
ab
cd
ef

Of course, this doesn't work because it is expecting a single element from the list which itself contains 2 elements.

What is a nice, short, clean, pythonic way to process the next n elements of a list in a batch, or sub-strings of length n from a larger string (two similar problems)?

+1  A: 

Responses to this question show a few methods.

Steve B.
Accepted! (for lists...). This seems shortest for lists (which I was also asking about), but slices seems to work better for strings, which was the example I actually used. A Starbucks voucher is in the post...
A: 

One solution, although I challenge someone to do better ;-)

a = 'abcdef'
b = [[a[i-1], a[i]] for i in range(1, len(a), 2)]

for x, y in b:
  print "%s%s\n" % (x, y)
jcoon
+3  A: 

I am sure someone is going to come up with some more "Pythonic" but how about:

for y in range(0, len(x), 2):
    print "%s%s" % (x[y], x[y+1])

Note that this would only work if you know that len(x) % 2 == 0;

Paolo Bergantino
start the range at 1 and then using x[y-1] will work for len(x)%2 == 1
jcoon
This answer seems simplest to me, accepted! -- with this slight modification which makes it shorter when handling batches > 2: for i in range(0, len(s), 2): print s[i:i+2]
though this answer is neither quite pythonic nor generic
rpr
It solved the OP's problem in a short and simple way. Your answer may be the most pythonic (and I even noted in my answer that it isn't pythonic) but that's hardly a reason for a downvote...
Paolo Bergantino
Downvote is neither necessary nor relevant, I agree. However, the problem is stated as "What is a nice, short, clean, pythonic way..?" and not as "a short and simple" solution. And independent votes reflect the quality of answers as how they match and satisfy the stated question. In this case, the OP chose what he/she thinks to satisfy the need and that is it. Though, I don't quite agree with that, still. Thanks...
rpr
Err.. right. Let's just agree to disagree.
Paolo Bergantino
+2  A: 

you can create the following generator

def chunks(seq, size):
    a = range(0, len(seq), size)
    b = range(size, len(seq) + 1, size)
    for i, j in zip(a, b):
        yield seq[i:j]

and use it like this:

for i in chunks('abcdef', 2):
    print(i)
SilentGhost
+12  A: 

A generator function would be neat:

def batch_gen(data, batch_size):
    for i in range(0, len(data), batch_size):
            yield data[i:i+batch_size]

Example use:

a = "abcdef"
for i in batch_gen(a, 2): print i

prints:

ab
cd
ef
rpr
+4  A: 

Don't forget about the zip() function:

a = 'abcdef'
for x,y in zip(a[::2], a[1::2]):
  print '%s%s' % (x,y)
jcoon
a very elegant solution!
culebrón
+4  A: 

but the more general way would be (inspired by this answer):

for i in zip(*(seq[i::size] for i in range(size))):
    print(i)                            # tuple of individual values
SilentGhost
+1 for elegant answer! But, there is one ")" too much in the end of the for-line
kigurai
+4  A: 

I've got an alternative approach, that works for iterables that don't have a known length.

   
def groupsgen(seq, size):
    it = iter(seq)
    while True:
        values = ()        
        for n in xrange(size):
            values += (it.next(),)        
        yield values    

It works by iterating over the sequence (or other iterator) in groups of size, collecting the values in a tuple. At the end of each group, it yield the tuple.

When the iterator runs out of values, it produces a StopIteration exception which is then propagated up, indicating that groupsgen is out of values.

It assumes that the values come in sets of size (sets of 2, 3, etc). If not, any values left over are just discarded.

Silverfish
A: 

How about itertools?

from itertools import islice, groupby

def chunks_islice(seq, size):
    while True:
        aux = list(islice(seq, 0, size))
        if not aux: break
        yield "".join(aux)

def chunks_groupby(seq, size):
    for k, chunk in groupby(enumerate(seq), lambda x: x[0] / size):
        yield "".join([i[1] for i in chunk])
To make code-blocks, you indent the code by four spaces (instead of using <pre><code> tags), the "101010" button in the editor toolbar does this for the selected text too
dbr
A: 
>>> a = "abcdef"
>>> size = 2
>>> [a[x:x+size] for x in range(0, len(a), size)]
['ab', 'cd', 'ef']

..or, not as a list comprehension:

a = "abcdef"
size = 2
output = []
for x in range(0, len(a), size):
    output.append(a[x:x+size])

Or, as a generator, which would be best if used multiple times (for a one-use thing, the list comprehension is probably "best"):

def chunker(thelist, segsize):
    for x in range(0, len(thelist), segsize):
            yield thelist[x:x+segsize]

..and it's usage:

>>> for seg in chunker(a, 2):
...     print seg
... 
ab
cd
ef
dbr
A: 

And then there's always the documentation.

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    try:
        b.next()
    except StopIteration:
        pass
    return izip(a, b)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

Note: these produce tuples instead of substrings, when given a string sequence as input.

ΤΖΩΤΖΙΟΥ
A: 

s = 'abcdefgh'
for e in (s[i:i+2] for i in range(0,len(s),2)):
  print(e)