views:

92

answers:

1

I have a number of Python generators, which I want to combine into a new generator. I can easily do this by a hand-written generator using a bunch of yield statements.

On the other hand, the itertools module is made for things like this and to me it seems as if the pythonic way to create the generator I need is to plug together various iterators of that itertools module.

However, in the problem at hand, it soon gets quite complicated (the generator needs to maintain a sort of state --- e.g. whether the first or later items are being processed ---, the i-th output further depends on conditions on the i-th input items and the various input lists have to be processed differently before they are being joined to the generated list.

As the composition of standard iterators that would solve my problem is --- due to the one-dimensional nature of writing down source code --- nearly incomprehensible, I wonder whether there are any advantages of using standard itertools generators versus hand-written generator functions (in basic and in more advanced cases). Actually, I think that in 90% of the cases, the hand-written versions are much easier to read --- probably due to their more imperative style compared to the functional style of chaining iterators.

EDIT

In order to illustrate my problem, here is a (toy) example: Let a and b be two iterables of the same length (the input data). The items of a consist of integers, the items of b are iterables themselves, whose individual items are strings. The output should correspond to the output of the following generator function:

from itertools import *
def generator(a, b):
    first = True
    for i, s in izip(a, b):
        if first:
            yield "First line"
            first = False
        else:
            yield "Some later line"
        if i == 0:
            yield "The parameter vanishes."
        else:
            yield "The parameter is:"
            yield i
        yield "The strings are:"
        comma = False
        for t in s:
            if comma:
                yield ','
            else:
                comma = True
            yield t

If I write down the same program in functional style using generator expressions and the itertools module, I end up with something like:

from itertools import *
def generator2(a, b):
    return (z for i, s, c in izip(a, b, count())
            for y in (("First line" if c == 0 else "Some later line",),
                      ("The parameter vanishes.",) if i == 0
                      else ("The parameter is:", i),
                      ("The strings are:",),
                      islice((x for t in s for x in (',', t)), 1, None))
            for z in y)

EXAMPLE

>>> a = (1, 0, 2), ("ab", "cd", "ef")
>>> print([x for x in generator(a, b)])
['First line', 'The parameter is:', 1, 'The strings are:', 'a', ',', 'b', 'Some later line', 'The parameter vanishes.', 'The strings are:', 'c', ',', 'd', 'Some later line', 'The parameter is:', 2, 'The strings are:', 'e', ',', 'f']
>>> print([x for x in generator2(a, b)])
['First line', 'The parameter is:', 1, 'The strings are:', 'a', ',', 'b', 'Some later line', 'The parameter vanishes.', 'The strings are:', 'c', ',', 'd', 'Some later line', 'The parameter is:', 2, 'The strings are:', 'e', ',', 'f']

This is possibly more elegant than my first solution but it looks like a write-once-do-not-understand-later piece of code. I am wondering whether this way of writing my generator has enough advantages that one should do so.

P.S.: I guess part of my problem with the functional solution is that in order to minimize the amount of keywords in Python, some keywords like "for", "if" and "else" have been recycled for use in expressions so that their placement in the expression takes getting used to (the ordering in the generator expression z for x in a for y in x for z in y looks, at least to me, less natural than the ordering in the classic for loop: for x in a: for y in x: for z in y: yield z).

+5  A: 

I did some profiling and the regular generator function is way faster than either your second generator or my implementation.

$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator1(a, b))'
10 loops, best of 3: 169 msec per loop

$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator2(a, b))'
10 loops, best of 3: 489 msec per loop

$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator3(a, b))'
10 loops, best of 3: 385 msec per loop

It also happens to be the most readable so I think i'd go with that. That being said, I'll still post my solution because I think it's a cleaner example of the sort of functional programming you can do with itertools (though clearly still not optimal, I feel like it should be able to smoke the regular generator function. I'll hack on it)

def generator3(parameters, strings):
    # replace strings with a generator of generators for the individual charachters
    strings = (it.islice((char for string_char in string_ for char in (',', string_char)), 1, None)
               for string_ in strings)

    # interpolate strings with the notices
    strings = (it.chain(('The strings are:',), string_) for string_ in strings)

    # nest them in tuples so they're ate the same level as the other generators
    separators = it.chain((('First line',),), it.cycle((('Some later line',),)))

    # replace the parameters with the appropriate tuples
    parameters = (('The parameter is:', p) if p else ('The parameter vanishes.',)
                  for p in parameters)

    # combine the separators, parameters and strings
    output = it.izip(separators, parameters, strings)

    # flatten it twice and return it
    output = it.chain.from_iterable(output)
    return it.chain.from_iterable(output)   

for reference, the test case is:

def make_test_case():
    a = [i % 100 for i in range(10000)]
    b = [('12345'*10)[:(i%50)+1] for i in range(10000)]
    return a, b
aaronasterling
Have posted some code under the EDIT section above.
Marc
@Marc, you still didn't fix the real confusion which is whether you want "The Paramater is: " and `i` yielded separately or as a tuple.
aaronasterling
Corrected the first piece of code (I am sorry for the confusion but I typed the first generator function in a hurry which made me write while instead of for.)As to the difference between "yield x; yield y" and "(x, y)": The output is to be consumed by another for loop, so for my purposes, it makes no difference whether the output is something like "(x, y)" or "iter((x, y))".However in my example above, the function generator2 outputs a generator, which behaves exactly like the output of the function generator. I don't see that generator2 outputs tuples (I have run the code on my system.)
Marc
@Marc. You're right. my bad, the second one is confusing to read but now that I look at it, it's clear.
aaronasterling
@Aaron, thanks a lot for your version generator3 (which showed me that one can write quite readable code using a lot of iterators) and your profiling. I have repeated the tests on my computer and get the same ratios between the timings. For the moment, I will thus stick to the generator function, which is the most understandable. However, this still leaves the question open when to use iterators when one could also write generator functions. (Or should one think of iterators just as convenience functions like the ones in the operator module of the Python distribution?)
Marc