ansaurus

Question

itertools or hand-written generator - what is preferable?

Answer 1

+5 A:

I did some profiling and the regular generator function is way faster than either your second generator or my implementation.

$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator1(a, b))'
10 loops, best of 3: 169 msec per loop

$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator2(a, b))'
10 loops, best of 3: 489 msec per loop

$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator3(a, b))'
10 loops, best of 3: 385 msec per loop

It also happens to be the most readable so I think i'd go with that. That being said, I'll still post my solution because I think it's a cleaner example of the sort of functional programming you can do with itertools (though clearly still not optimal, I feel like it should be able to smoke the regular generator function. I'll hack on it)

def generator3(parameters, strings):
    # replace strings with a generator of generators for the individual charachters
    strings = (it.islice((char for string_char in string_ for char in (',', string_char)), 1, None)
               for string_ in strings)

    # interpolate strings with the notices
    strings = (it.chain(('The strings are:',), string_) for string_ in strings)

    # nest them in tuples so they're ate the same level as the other generators
    separators = it.chain((('First line',),), it.cycle((('Some later line',),)))

    # replace the parameters with the appropriate tuples
    parameters = (('The parameter is:', p) if p else ('The parameter vanishes.',)
                  for p in parameters)

    # combine the separators, parameters and strings
    output = it.izip(separators, parameters, strings)

    # flatten it twice and return it
    output = it.chain.from_iterable(output)
    return it.chain.from_iterable(output)

for reference, the test case is:

def make_test_case():
    a = [i % 100 for i in range(10000)]
    b = [('12345'*10)[:(i%50)+1] for i in range(10000)]
    return a, b

aaronasterling 2010-10-03 12:44:54

Have posted some code under the EDIT section above.

Marc 2010-10-03 18:08:58

@Marc, you still didn't fix the real confusion which is whether you want "The Paramater is: " and `i` yielded separately or as a tuple.

aaronasterling 2010-10-04 07:35:43

Corrected the first piece of code (I am sorry for the confusion but I typed the first generator function in a hurry which made me write while instead of for.)As to the difference between "yield x; yield y" and "(x, y)": The output is to be consumed by another for loop, so for my purposes, it makes no difference whether the output is something like "(x, y)" or "iter((x, y))".However in my example above, the function generator2 outputs a generator, which behaves exactly like the output of the function generator. I don't see that generator2 outputs tuples (I have run the code on my system.)

Marc 2010-10-04 07:41:46

@Marc. You're right. my bad, the second one is confusing to read but now that I look at it, it's clear.

aaronasterling 2010-10-04 08:58:58

@Aaron, thanks a lot for your version generator3 (which showed me that one can write quite readable code using a lot of iterators) and your profiling. I have repeated the tests on my computer and get the same ratios between the timings. For the moment, I will thus stick to the generator function, which is the most understandable. However, this still leaves the question open when to use iterators when one could also write generator functions. (Or should one think of iterators just as convenience functions like the ones in the operator module of the Python distribution?)

Marc 2010-10-05 08:50:28

ansaurus

tags:

views:

answers:

itertools or hand-written generator - what is preferable?

related questions