views:

267

answers:

4

I'm trying to write the Haskel function 'splitEvery' in Python. Here is it's definition:

splitEvery :: Int -> [e] -> [[e]]
    @'splitEvery' n@ splits a list into length-n pieces.  The last
    piece will be shorter if @n@ does not evenly divide the length of
    the list.

The basic version of this works fine, but I want a version that works with generator expressions, lists, and iterators. And, if there is a generator as an input it should return a generator as an output!

Tests

# should not enter infinite loop with generators or lists
splitEvery(itertools.count(), 10)
splitEvery(range(1000), 10)

# last piece must be shorter if n does not evenly divide
assert splitEvery(5, range(9)) == [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

# should give same correct results with generators
tmp = itertools.islice(itertools.count(), 10)
assert list(splitEvery(5, tmp)) == [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

Current Implementation

Here is the code I currently have but it doesn't work with a simple list.

def splitEvery_1(n, iterable):
    res = list(itertools.islice(iterable, n))
    while len(res) != 0:
        yield res
        res = list(itertools.islice(iterable, n))

This one doesn't work with a generator expression (thanks to jellybean for fixing it):

def splitEvery_2(n, iterable): 
    return [iterable[i:i+n] for i in range(0, len(iterable), n)]

There has to be a simple piece of code that does the splitting. I know I could just have different functions but it seems like it should be and easy thing to do. I'm probably getting stuck on an unimportant problem but it's really bugging me.


It is similar to grouper from http://docs.python.org/library/itertools.html#itertools.groupby but I don't want it to fill extra values.

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

It does mention a method that truncates the last value. This isn't what I want either.

The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using izip([iter(s)]n).

list(izip(*[iter(range(9))]*5)) == [[0, 1, 2, 3, 4]]
# should be [[0, 1, 2, 3, 4], [5, 6, 7, 8]]
+1  A: 

Why not do it like this? Looks almost like your splitEvery_2 function.

def splitEveryN(n, it):
    return [it[i:i+n] for i in range(0, len(it), n)]

Actually it only takes away the unnecessary step interval from the slice in your solution. :)

jellybean
That was actually what I meant with my `splitEvery_2` function. It doesn't work if you input a generator expression. I think I will probably just convert my generator to a list to make things simple, but the answer will still bug me.
James Brooks
Iterators don't support the `len` function, although a list or a tuple would. For example `len(itertools.imap(lambda x:x*2, range(3)))` will fail.
Cristian Ciupitu
A: 

Here is how you deal with list vs iterator:

def isList(L): # Implement it somehow - returns True or false
...
return (list, lambda x:x)[int(islist(L))](result)
Hamish Grubijan
don't understand how that solves my problem sorry.
James Brooks
+5  A: 
from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

Some tests:

>>> list(split_every(5, range(9)))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]

>>> list(split_every(3, (x**2 for x in range(20))))
[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361]]

>>> [''.join(s) for s in split_every(6, 'Hello world')]
['Hello ', 'world']

>>> list(split_every(100, []))
[]
Roberto Bonvallet
perfect, thanks.
James Brooks
+1  A: 

I think those questions are almost equal

Changing a little bit to crop the last, I think a good solution for the generator case would be:

from itertools import *
def iter_grouper(n, iterable):
    it = iter(iterable)
    item = itertools.islice(it, n)
    while item:
        yield item
        item = itertools.islice(it, n)

for the object that supports slices (lists, strings, tuples), we can do:

def slice_grouper(n, sequence):
   return [sequence[i:i+n] for i in range(0, len(sequence), n)]

now it's just a matter of dispatching the correct method:

def grouper(n, iter_or_seq):
    if hasattr(iter_or_seq, "__getslice__"):
        return slice_grouper(n, iter_or_seq)
    elif hasattr(iter_or_seq, "__iter__"):
        return iter_grouper(n, iter_or_seq)

I think you could polish it a little bit more :-)

fortran
It is similar, and I *do* still want the last chunk. I just want it to work with generators as well as lists.
James Brooks
oh, sorry, I misunderstood that part then... I'll fix it
fortran
I did think about this but I thought there had to be a simpler way than `hasattr`. Roberto Bonvallet posted it so he gets the answer. That said yours appears to work +1.
James Brooks