views:

164

answers:

2

Let's say I have a generator function like this:

import random
def big_gen():
  i = 0
  group = 'a'
  while group != 'd':
    i += 1
    yield (group, i)
    if random.random() < 0.20:
      group = chr(ord(group) + 1)

Example output might be: ('a', 1), ('a', 2), ('a', 3), ('a', 4), ('a', 5), ('a', 6), ('a', 7), ('a', 8), ('b', 9), ('c', 10), ('c', 11), ('c', 12), ('c', 13)

I would like to break this into three groups: Group A, Group B, and Group C. And I would like a generator for each group. Then I'd pass the generator and the group letter into a subfunction. An example of the subfunction:

def printer(group_letter, generator):
  print "These numbers are in group %s:" % group_letter
  for num in generator:
    print "\t%s" % num

The desired output would be:

These numbers are in group a:
1
2
3
4
5
6
7
8
These numbers are in group b:
9
These numbers are in group c:
10
11
12
13

How can I do this without changing big_gen() or printer(), and avoid storing the entire group in memory at once? (In real life, the groups are huge)

A: 

You have a slight problem here. You'd like the function to printer() to take a generator for each group, but in reality you have the same generator yielding all groups. You have two options, as I see it:

1) Change big_gen() to yield generators:

import random
def big_gen():
  i = 0
  group = 'a'
  while group != 'd':
    def gen():
        i += 1
        yield i
        if random.random() < 0.20:
            group = chr(ord(group) + 1)
    yield group, gen

 from itertools import imap
 imap(lambda a: printer(*a), big_gen())

2) Change printer() to keep state and notice when the group changes (keeping your original big_gen() function):

def printer(generator):
  group = None
  for grp, num in generator:
    if grp != group:
        print "These numbers are in group %s:" % grp
        group = grp
    print "\t%s" % num
Cide
+7  A: 

Sure, this does what you want:

import itertools
import operator

def main():
  for let, gen in itertools.groupby(big_gen(), key=operator.itemgetter(0)):
    secgen = itertools.imap(operator.itemgetter(1), gen)
    printer(let, secgen)

groupby does the bulk of the work here -- the key= just tells it what field to group by.

The resulting generator needs to be wrapped in an imap just because you've specified your printer signature to take an iterator over number, while, by nature, groupby returns iterators over the same items it gets as its input -- here, 2-items tuples with a letter followed by a number -- but this is not really all that germane to your question's title.

The answer to that title is that, yep, a Python function can perfectly well do the job you want -- itertools.groupby in fact does exactly that. I recommend studying the itertools module carefully, it's a very useful tool (and delivers splendid performance as well).

Alex Martelli