ansaurus

Question

Where do you use generators feature in your python code?

Answer 1

+4 A:

Whenever your code would either generate an unlimited number of values or more generally if too much memory would be consumed by generating the whole list at first.

Or if it is likely that you don't iterate over the whole generated list (and the list is very large). I mean there is no point in generating every value first (and waiting for the generation) if it is not used.

My latest encounter with generators was when I implemented a linear recurrent sequence (LRS) like e.g. the Fibonacci sequence.

Felix Kling 2010-03-29 07:05:28

-1: Sounds to me more like a description of iterators in general, not generator functions, so it misses the point. Why did this answer get any upvotes?

nikow 2010-03-29 10:53:50

@nikow: Yes it is more general, but I wouldn't say that it is a *description* of iterators. It is an abstract description about in which situations generators could be useful. Generators are some kind of iterators.. :)

Felix Kling 2010-03-29 11:24:39

Answer 2

+6 A:

I use them a lot when I implement scanners (tokenizers) or when I iterate over data containers.

Edit: here is a demo tokenizer I used for a C++ syntax highlight program:

whitespace = ' \t\r\n'
operators = '~!%^&*()-+=[]{};:\'"/?.,<>\\|'

def scan(s):
    "returns a token and a state/token id"
    words = {0:'', 1:'', 2:''} # normal, operator, whitespace
    state = 2 # I pick ws as first state
    for c in s:
        if c in operators:
            if state != 1:
                yield (words[state], state)
                words[state] = ''
            state = 1
            words[state] += c
        elif c in whitespace:
            if state != 2:
                yield (words[state], state)
                words[state] = ''
            state = 2
            words[state] += c
        else:
            if state != 0:
                yield (words[state], state)
                words[state] = ''
            state = 0
            words[state] += c
    yield (words[state], state)

Usage example:

>>> it = scan('foo(); i++')
>>> it.next()
('', 2)
>>> it.next()
('foo', 0)
>>> it.next()
('();', 1)
>>> it.next()
(' ', 2)
>>> it.next()
('i', 0)
>>> it.next()
('++', 1)
>>>

Nick D 2010-03-29 07:12:34

Could you post some simple snippet of tokenizers?

systempuntoout 2010-03-29 08:12:21

@systempuntoout, ok, I posted a sample.

Nick D 2010-03-29 08:39:48

Good example, many thanks!

systempuntoout 2010-03-29 09:00:12

@systempuntoout, you're welcome :)

Nick D 2010-03-29 09:23:32

Answer 3

+1 A:

In general, to separate data aquisition (which might be complicated) from consumption. In particular:

to concatenate results of several b-tree queries - the db part generates and executes the queries yield-ing records from each one, the consumer only sees single data items arriving.
buffering (read-ahead ) - the generator fetches data in blocks and yields single elements from each block. Again, the consumer is separated from the gory details.

Generators can also work as coroutines. You can pass data into them using nextval=g.next(data) on the 'consumer' side and data = yield(nextval) on the generator side. In this case the generator and its consumer 'swap' values. You can even make yield throw an exception within the generator context: g.throw(exc) does that.

Rafał Dowgird 2010-03-29 08:35:09

Buffering is a great example, thanks.

systempuntoout 2010-03-29 10:14:39

Answer 4

+2 A:

In all cases where I have algorithms that read anything, I use generators exclusively.

Why?

Layering in filtering, mapping and reduction rules is so much easier in a context of multiple generators.

Example:

def discard_blank( source ):
    for line in source:
        if len(line) == 0:
            continue
        yield line

def clean_end( source ):
    for line in source:
        yield line.rstrip()

def split_fields( source ):
    for line in source;
        yield line.split()

def convert_pos( tuple_source, position ):
    for line in tuple_source:
        yield line[:position]+int(line[position])+line[position+1:]

with open('somefile','r') as source:
    data= convert_pos( split_fields( discard_blank( clean_end( source ) ) ), 0 )
    total= 0
    for l in data:
        print l
        total += l[0]
    print total

My preference is to use many small generators so that a small change is not disruptive to the entire process chain.

S.Lott 2010-03-29 10:21:25

Wouldn't functions work just as well?

J.T. Hurley 2010-03-29 10:27:27

So you are just using generator functions as a convenient notation for iterator decorators. I think the example of Nick D is much better, since it highlights the continuation aspect.

nikow 2010-03-29 10:56:32

@J. T. Hurley: I don't know what "just as well" means, but generators don't create intermediate results, where functions generally do. Nested generators are a kind of map-reduce pipeline.

S.Lott 2010-03-29 14:48:46

@nikow: I'm not interested in continuations. I'm interested in breaking up a big (possibly messy) operation into small map-reduce style steps.

S.Lott 2010-03-29 14:49:21

@S.Lott: Continuations are an interesting feature (especially with the newer coroutine functionality), so its too bad that you are not interested in them ;-)

nikow 2010-03-29 15:12:16

@nikow: I'm not interested in them *for the purposes of this answer*. They're an interesting feature. I've never used them except as an exercise. I have always used stateful, callable objects instead of continuations. But that gets beyond this specific answer. My point was not to cover all possible uses of generators. My point was to answer the question: "any other effective example where generators are the best tool for the job".

S.Lott 2010-03-29 15:39:21

ansaurus

tags:

views:

answers:

Where do you use generators feature in your python code?

related questions