views:

318

answers:

4

I'm looking for a way to "page through" a Python iterator. That is, I would like to wrap a given iterator iter and page_size with another iterator that would would return the items from iter as a series of "pages". Each page would itself be an iterator with up to page_size iterations.

I looked through itertools and the closest thing I saw is itertools.islice. In some ways, what I'd like is the opposite of itertools.chain -- instead of chaining a series of iterators together into one iterator, I'd like to break an iterator up into a series of smaller iterators. I was expecting to find a paging function in itertools but couldn't locate one.

I came up with the following pager class and demonstration.

class pager(object):
    """
    takes the iterable iter and page_size to create an iterator that "pages through" iter.  That is, pager returns a series of page iterators,
    each returning up to page_size items from iter.
    """
    def __init__(self,iter, page_size):
        self.iter = iter
        self.page_size = page_size
    def __iter__(self):
        return self
    def next(self):
        # if self.iter has not been exhausted, return the next slice
        # I'm using a technique from 
        # http://stackoverflow.com/questions/1264319/need-to-add-an-element-at-the-start-of-an-iterator-in-python
        # to check for iterator completion by cloning self.iter into 3 copies:
        # 1) self.iter gets advanced to the next page
        # 2) peek is used to check on whether self.iter is done
        # 3) iter_for_return is to create an independent page of the iterator to be used by caller of pager
        self.iter, peek, iter_for_return = itertools.tee(self.iter, 3)
        try:
            next_v = next(peek)
        except StopIteration: # catch the exception and then raise it
            raise StopIteration
        else:
            # consume the page from the iterator so that the next page is up in the next iteration
            # is there a better way to do this?
            # 
            for i in itertools.islice(self.iter,self.page_size): pass
            return itertools.islice(iter_for_return,self.page_size)



iterator_size = 10
page_size = 3

my_pager = pager(xrange(iterator_size),page_size)

# skip a page, then print out rest, and then show the first page
page1 = my_pager.next()

for page in my_pager:
    for i in page:
        print i
    print "----"

print "skipped first page: " , list(page1)   

I'm looking for some feedback and have the following questions:

  1. Is there a pager already in itertools that serves a pager that I'm overlooking?
  2. Cloning self.iter 3 times seems kludgy to me. One clone is to check whether self.iter has any more items. I decided to go with a technique Alex Martelli suggested (aware that he wrote of a wrapping technique). The second clone was to enable the returned page to be independent of the internal iterator (self.iter). Is there a way to avoid making 3 clones?
  3. Is there a better way to deal with the StopIteration exception beside catching it and then raising it again? I am tempted to not catch it at all and let it bubble up.

Thanks! -Raymond

+3  A: 

Look at grouper() in the itertools recipes.

Ignacio Vazquez-Abrams
Thanks for pointing out the recipes. I can see using grouper because it's efficient and adapting the recipe to behave exactly like my Pager. I'm still curious as to whether Pager as it stands has much merit -- or I should abandon it for a grouper-like approach.
Raymond Yee
A: 

Based on the pointer to the itertools recipe for grouper(), I came up with the following adaption of grouper() to mimic Pager. I wanted to filter out any None results and wanted to return an iterator rather than a tuple (though I suspect that there might be little advantage in doing this conversion)

# based on http://docs.python.org/library/itertools.html#recipes
def grouper2(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    for item in izip_longest(fillvalue=fillvalue, *args):
        yield iter(filter(None,item))

I'd welcome feedback on how what I can do to improve this code.

Raymond Yee
+1  A: 

I'd do it like this:

def pager(iterable, page_size):
    args = [iter(iterable)] * page_size
    fillvalue = object()
    for group in izip_longest(fillvalue=fillvalue, *args):
        yield (elem for elem in group if elem is not fillvalue)

That way, None can be a legitimate value that the iterator spits out. Only the single object fillvalue filtered out, and it cannot possibly be an element of the iterable.

Matt Anderson
Thanks, Matt. You made me realize that I was both not allowing for None to be a legit value from the iterator and I was not accounting for the fillvalue.
Raymond Yee
A: 

Why aren't you using this?

def grouper( page_size, iterable ):
    page= []
    for item in iterable:
        page.append( item )
        if len(page) == page_size:
            yield page
            page= []
    yield page

"Each page would itself be an iterator with up to page_size" items. Each page is a simple list of items, which is iterable. You could use yield iter(page) to yield the iterator instead of the object, but I don't see how that improves anything.

It throws a standard StopIteration at the end.

What more would you want?

S.Lott
Thanks for answering my question and providing a good way to think about how to just loop through the iterator. I think that there is a small error -- did you mean to append the item to the page -- as in:def grouper(page_size,iterable): page= [] for item in iterable: if len(page) == page_size: yield page page= [] else: page.append(item) yield page
Raymond Yee
@raymondyee: Actually, there's a better way. Your version harbors a big. Try and see that it skips an item.
S.Lott
@S.Lott -- yes, of course, I put my page.append(item) in the wrong place. Thanks for the correction.I'm still learning about when itertools can help and when there's no need for it. Any guidelines to offer?
Raymond Yee
@raymondyee: No advice. I don't use iterools all that often. Generator functions are very simple.
S.Lott