tags:

views:

41085

answers:

9

What is the use of the yield keyword in Python? What does it do?

For example, I'm trying to understand this code (**):

def node._get_child_candidates(self, distance, min_dist, max_dist):
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild  

And this is the caller:

result, candidates = list(), [self]
while candidates:
    node = candidates.pop()
    distance = node._get_dist(obj)
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)
        candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
        return result

What happens when the method _get_child_candidates is called? A list is returned? A single element is returned? Is it called again? When subsequent calls do stop?

** The code comes from Jochen Schulz (jrschulz), who made a great Python library for metric spaces. This is the link to the complete source: http://well-adjusted.de/~jrschulz/mspace/.

+4  A: 

yield is just like return. It returns whatever you tell it to. The only difference is that the next time you call the function, execution starts from the last call to the yield statement.

In the case of your code, the function get_child_candidates is acting like an iterator so that when you extend your list, it adds one element at a time to the new list.

list.extend calls an iterator until it's exhausted. In the case of the code sample you posted, it would be much clearer to just return a tuple and append that to the list.

Douglas Mayle
This is close, but not correct. Every time you call a function with a yield statement in it, it returns a brand new generator object. It's only when you call that generator's .next() method that execution resumes after the last yield.
kurosch
+1  A: 

From The Python Reference Manual:

The yield statement is only used when defining a generator function, and is only used in the body of the generator function. Using a yield statement in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.

When a generator function is called, it returns an iterator known as a generator iterator, or more commonly, a generator. The body of the generator function is executed by calling the generator's next() method repeatedly until it raises an exception.

When a yield statement is executed, the state of the generator is frozen and the value of expression_list is returned to next()'s caller. By ``frozen'' we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time next() is invoked, the function can proceed exactly as if the yield statement were just another external call.

Do you have the entire generator function listed? It doesn't look complete.

My guess from the incomplete code is that you set the parameters, and every time you call the function it returns the next child node that fits the parameters. The yield allows you to easily write a function (generator function) that is called repeatedly to gather 'chunks' of information, by 'pausing' the function, returning the value, and then continuing the function where it was paused when next called.

Adam Davis
+1  A: 

It's returning a generator. I'm not particularly familiar with Python, but I believe it's the same kind of thing as C#'s iterator blocks if you're familiar with those.

There's an IBM article which explains it reasonably well (for Python) as far as I can see.

The key idea is that the compiler/interpreter/whatever does some trickery so that as far as the caller is concerned, they can keep calling next() and it will keep returning values - as if the generator method was paused. Now obviously you can't really "pause" a method, so the compiler builds a state machine for you to remember where you currently are and what the local variables etc look like. This is much easier than writing an iterator yourself.

Jon Skeet
In essence, you're correct. Python iterators are the same basic idea as C#'s IEnumerators (or IEnumerables, I always get those mixed up). In fact, I'm pretty sure the yield return keyword was inspired by python. :)
Jason Baker
@Jason Baker: at least wikipedia agrees with you: http://en.wikipedia.org/wiki/C_Sharp_programming_language (look for yield). I also have the same impression, remember mention in comp.lang.python of C# getting a "yield return", but I strangely couldn't find some official statement.
ΤΖΩΤΖΙΟΥ
+13  A: 

Think of it this way:

An iterator is just a fancy sounding term for an object that has a next() method. So a yield-ed function ends up being something like this:

Original version:

def some_function():
    for i in xrange(4):
        yield i

for i in some_function():
    print i

This is basically what the python interpreter does with the above code:

class it:
    def __init__(self):
        self.count = -1  #start at -1 so that we get 0 when we add 1 below
    def next(self):    #the next method will be called implicitly by the for loop
        self.count += 1
        if self.count < 4:
            return self.count
        else:
           #a StopIteration exception is raised to signal that the iterator is done.  This is caught implicitly by the for loop
            raise StopIteration 

def some_func():
    return it()

for i in some_func():
    print i

For more insight as to what's happening behind the scenes, the for loop can be rewritten to this:

iterator = some_func()
try:
    while 1:
        print iterator.next()
except StopIteration:
    pass

Does that make more sense or just confuse you more? :)

EDIT: I should note that this IS an oversimplification for illustrative purposes. :)

EDIT 2: Forgot to throw the StopIteration exception

Jason Baker
for-loop expects an iterable (not iterator), therefore an `__iter__` method must be defined. In case of an iterator object it is very simple: `__iter__ = lambda self: self`
J.F. Sebastian
`__getitem__` could be defined instead of `__iter__`. For example: `class it: pass; it.__getitem__ = lambda self, i: i*10 if i < 10 else [][0]; for i in it(): print(i)`, It will print: 0, 10, 20, ..., 90
J.F. Sebastian
Like I said, it's intentionally over-simplified. :)
Jason Baker
+387  A: 

Iterables

To understand what yield does, you must understand what generators are. And before generators come iterables. When you create a list, you can read its items one by one, and it's called iteration :

>>> mylist = [1, 2, 3]
>>> for i in mylist :
...    print(i)
1
2
3

Mylist is an iterable. When you use a comprehension list, you create a list, and so an iterable :

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist :
...    print(i)
0
1
4

Everything you can use "for... in..." on is an iterable : lists, strings, files... These iterables are handy because you can read them as much as you wish, but you store all the values in memory and it's not always what you want when you have a lot of values.

Generators

Generators are iterables, but you can only read them once. It's because they do not store all the values in memory, they generate the values on the fly :

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator :
...    print(i)
0
1
4

It just the same except you used () instead of []. BUT, you can not perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1 and ends calculating 4, one by one.

Yield

Yield is a keyword that is used like return, except the function will return a generator.

>>> def createGenerator() :
...    mylist = range(3)
...    for i in mylist :
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object !
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is bit tricky :-)

Then, your code will be run each time the for uses the generator.

Now the hard part :

The first time your function will run, it will run from the beginning until it hits yield, then it'll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return.

The generator is considered empty once the function runs but does not hit yield anymore. It can be because the loop had come to ends, or because you do not satisfy a "if/else" anymore.

Your code explained

Generator:

# Here you create the method of the node object that will return the generator
def node._get_child_candidates(self, distance, min_dist, max_dist):

  # Here is the code that will be called each time you use the generator object :

  # If there is still a child of the node object on its left
  # AND if distance is ok, return the next child
  if self._leftchild and distance - max_dist < self._median:
                yield self._leftchild

  # If there is still a child of the node object on its right
  # AND if distance is ok, return the next child
  if self._rightchild and distance + max_dist >= self._median:
                yield self._rightchild

  # If the function arrives here, the generator will be considered empty
  # there is no more than two values : the left and the right children

Caller:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning) 
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidates list 
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

This code contains several smart parts :

  • The loop iterate on a list but the list expends while the loop is been iterated :-) It's a concise way to go through all these nested data even if it's a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhausts all the values of the generator, but while keep creating new generators object which will produce different values from the previous ones since it's not applied on the same node.

  • The extend() method is a list object method that expects an iterable and adds its values to the list.

Usually we pass a list to it :

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

But in your code it gets a generator, which is good because :

  1. You don't need to read the values twice.
  2. You can have a lot of children and you don't want them all stored in memory.

And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples and generators ! This is called duck typing and is one of the reason why Python is so cool. But this is another story, for another question...

You can stop here, or read a little bit to see a advanced use of generator :

Controlling a generator exhaustion

>>> class Bank(): # let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self) :
...        while not self.crisis :
...            yield "$100"
>>> hsbc = Bank() # when everything is fine, you can get as much money as you want from an ATM
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # crisis is coming, no more money !
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # but it's true even for newly built ATM
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # trouble is, when the crisis is off, the ATM are still empty...
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # but if you build new ones, you're in business again !
>>> for cash in brand_new_atm :
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

It can be useful for various things like controlling access to a resource.

Itertools, your best friend

The itertools module contains special functions to manipulate iterables. Every wish to duplicate a generator ? Chain a two generators ? Groups values in nested list with a one liner ? Map / Zip without creating another list ?

Then just import itertools.

An example ? Let's see the possible orders of arrival for a 4 horses race:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

Understanding the inner mechanisms of iteration

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method). Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

More about it in in this article about how does the for loop work.

Oh, and if you liked this answer, you'll probably like my explanation for decorators.


/EDIT : this section used ot contain answers to comments. Since the last revision, I merged everything.

e-satis
Do you mean to yield i*i in the last example?
MattSmith
yes, thx, corrected :-)
e-satis
good job, I don't know if you wrote it just now, but I learned generator and yield from your writing.
Haoest
Thank you so much!!!, your answer (and python, of course) has impressed me!!!
Alex. S.
Since extend() expects an iterator, won't it actually force the generator to spit all of its values out? Since there is no loop in the _get_child_candidates, won't the generator produce at most 2 items?
Mr Fooz
@Haoest : yep, wrote it just yesterday. And your "thank you" warm me up this morning so I'll write others like this one :-)@alex : you're welcome. I wasn't sure my anwser would fet since I am french and sometimes my english is a bit rusty.@fooz : adding anwser in the text
e-satis
Wow! Great answer!
Auron
`mylist` is *not* an iterator (hasattr(mylist, 'next') == False). It is an iterable (hasattr(mylist, '__iter__') == True). `mylist.extend` accept an iterable (not iterator) as argument. To understand how the for-loop works in Python read http://effbot.org/zone/python-for-statement.htm
J.F. Sebastian
Thanks for pointing this out ! I have learned something answsering the question !
e-satis
Read it. enjoyed it. fav'd it. thank you.
Nick Stinemates
Read to the end. Nice answer.
ryan_s
Thx for the feedback :-)
e-satis
Yes, it was loooooooooooooong but nice.
JV
thanks for this answer, I found it while looking for info on generators, and yes I read all the way to the bottom :)
Asa Ayers
Wow, there is still people to read it :-)
e-satis
And there will be plenty more reading your answer for quite some time to come. I've done very little Python so far, but your answer has taught me a lot about the yield statement. If I ever have to use it, I know that, because of your answer, I can use it with confidence. +1
RobH
And people are still reading this great answer ...
Adam Nelson
Nice. Thanks for writing it down. As long as I got comment on this, I will regularly check if it doesn't need any update !
e-satis
Epic answer. I got to the bottom. Thanks, e-satis!
bryan
I just wanted to say thanks!, this is gold!
igorgue
Nice explanation
DShook
Priceless. I knew iterators, but it was totally worth reading for a banking story.
ilya n.
Great answer, and yes i read the whole thing... skimmed a little bit yes..
Espen Christensen
Thks pale, feedback always appreciated.
e-satis
And there are still people that read it. :) I learned a lot from it; thanks muchly for the detailed explanation!
Brooks Moses
Quality answer. Totally over my newbie head but I read it right to the bottom - and found some strange understanding creeping in!. I liked the fact you took the question and broke it down into it's components - beginning with the foundations and then adding the rest on top, until you answered the question itself. Great use of comments in the code by the way - so very useful.
Mike
Thanks for explicitly saying why you think this answer is usefull. That said, I can improve the futur anwsers using you points. Cheers.
e-satis
It's worth it. I bookmarked this months ago and come back to it every once in a while when I forget how these work. It's a great tutorial.
vgm64
One year later, someone still finds it useful. Thanks :-)
Flavius
Cracking answer. Cheers.
Gavin Gilmour
Lol, eventually somebody has downvoted it. It had to happen :-) Can the author tell me why, so I can improve it ?
e-satis
@e-satis: To make it be at exactly 100 upvotes?
voyager
Come on, 64, 128, or 256 is ok. But 100 seriously, can't he pick a round number ? Or 42 at least...
e-satis
42 is not a vote number, is the answer in itself.
voyager
I know, I'm panicking. I should no panic. Where is my towel ?
e-satis
Never leave your house without a towel. I'd much rather be happy than right any day.
voyager
I read it to the bottom and this answer was excellent!
Adam
Great answer! One thing I noticed, in the paragraph "To master yield, [...] The function only returns the iterator object, this is bit tricky :-)" Shouldn't it say "The function only returns the GENERATOR object, ..."?
cschol
Thanks for the typo.
e-satis
The explanation of iterators is flawed. `mylist` is *not* an iterator, it's a list that can return an iterator via the `__iter__` function. Iterators *don't* "store everything in memory".
Craig McQueen
Yes, it has been noticed already and corrected at the end of the answer.
e-satis
Since you have the capability to fully edit, you should eliminate the error entirely. Your errata note at the end of such a long answer is only going to be read by a small percentage of readers.
Craig McQueen
Having used python only twice, this answer has made me reconsider. Simply marvelous.
Mark Essel
Learned a lot from your answer. Thanks! Great work!
dusan
+3  A: 

An example in plain language. I will provide a correspondence between high-level human concepts to low-level python concepts.

I want to operate on a sequence of numbers, but I don't want to bother my self with the creation of that sequence, I want only to focus on the operation I want to do. So, I do the following:

  • I call you and tell you that I want a sequence of numbers which is produced in a specific way, and I let you know what the algorithm is.
    This step corresponds to defining the generator function, i.e. the function containing a yield.
  • Sometime later, I tell you, "ok, get ready to tell me the sequence of numbers".
    This step corresponds to calling the generator function which returns a generator object. Note that you don't tell me any numbers yet, you just grab your paper and pencil.
  • I ask you, "tell me the next number", and you tell me the first number; after that, you wait for me to ask you for the next number. It's your job to remember where you were, what numbers you have already said, what is the next number. I don't care about the details.
    This step corresponds to calling .next() on the generator object.
  • … repeat previous step, until…
  • eventually, you might come to an end. You don't tell me a number, you just shout, "hold your horses! I'm done! No more numbers!"
    This step corresponds to the generator object ending its job, and raising a StopIteration exception The generator function does not need to raise the exception, it's raised automatically when the function ends or issues a return.

This is what a generator does (a function that contains a yield); it starts executing, pauses whenever it does a yield, and when asked for a .next() value it continues from the point it was last. It fits perfectly by design with the iterator protocol of python, which describes how to sequentially request for values.

The most famous user of the iterator protocol is the for command in python. So, whenever you do a:

for item in sequence:

it doesn't matter if sequence is a list, a string, a dictionary or a generator object like described above; the result is the same: you read items off a sequence one by one.

Note that defining a function which contains a yield keyword is not the only way to create a generator; it's just the easiest way to create one.

For more accurate information, read about iterator types, the yield statement and generators in the Python documentation.

ΤΖΩΤΖΙΟΥ
+7  A: 

I feel like I post a link to this presentation every day: David M. Beazly's Generator Tricks for Systems Programmers. If you're a Python programmer and you're not extremely familiar with generators, you should read this. It's a very clear explanation of what generators are, how they work, what the yield statement does, and it answers the question "Do you really want to mess around with this obscure language feature?"

SPOILER ALERT. The answer is: Yes. Yes, you do.

Robert Rossney
+2  A: 

There's one extra thing to mention: a function that yields doesn't actually have to terminate. I've written code like this:

def fib():
    yield 1
    yield 1
    cur = 1
    last = 1
    while True:
        cur, last = cur+last, cur
        yield cur

Then I can use it in other code like this:

for f in fib():
    if some_condition: break
    coolfuncs(f);

It really helps simplify some problems, and makes some things easier to work with.

Claudiu
+34  A: 

Shortcut to Grokking yield

When you see a function with yield statements, apply this easy trick to understand what will happen:

  1. Insert a line result = [] at the start of the function.
  2. Replace each yield expr with result.append(expr).
  3. Insert a line return result at the bottom of the function.
  4. Yay - no more yield statements! Read and figure out code.
  5. Revert function to original definition.

This trick may give you an idea of the logic behind the function, but what actually happens with yield is significantly different that what happens in the list based approach. In many cases the yield approach will be a lot more memory efficient and faster too. In other cases this trick will get you stuck in an infinite loop, even though the original function works just fine. Read on to learn more...

Don't confuse your Iterables, Iterators and Generators

First, the iterator protocol - when you write

for x in mylist:
    ...loop body...

Python performs the following two steps:

  1. Gets an iterator for mylist:

    Call iter(mylist) -> this returns an object with a next() method.

    [This is the step most people forget to tell you about]

  2. Uses the iterator to loop over items:

    Keep calling the next() method on the iterator returned from step 1. The return value from next() is assigned to x and the loop body is executed. If an exception StopIteration is raised from within next(), it means there are no more values in the iterator and the loop is exited.

The truth is Python performs the above two steps anytime it wants to loop over the contents of an object - so it could be a for loop, but it could also be code like otherlist.extend(mylist) (where otherlist is a Python list).

Here mylist is an iterable because it implements the iterator protocol. In a user defined class, you can implement the __iter__() method to make instances of your class iterable. This method should return an iterator. An iterator is an object with a next() method. It is possible to implement both __iter__() and next() on the same class, and have __iter__() return self. This will work for simple cases, but not when you want two iterators looping over the same object at the same time.

So that's the iterator protocol, many objects implement this protocol:

  1. Built-in lists, dictionaries, tuples, sets, files.
  2. User defined classes that implement __iter__().
  3. Generators.

Note that a for loop doesn't know what kind of object it's dealing with - it just follows the iterator protocol, and is happy to get item after item as it calls next(). Built-in lists return their items one by one, dictionaries return the keys one by one, files return the lines one by one, etc. And generators return... well that's where yield comes in:

def f123():
    yield 1
    yield 2
    yield 3

for item in f():
    print item

Instead of yield statements, if you had three return statements in f123() only the first would get executed, and the function would exit. But f123() is no ordinary function. When f123() is called, it does not return any of the values in the yield statements! It returns a generator object. Also, the function does not really exit - it goes into a suspended state. When the for loop tries to loop over the generator object, the function resumes from its suspended state, runs until the next yield statement and returns that as the next item. This happens until the function exits, at which point the generator raises StopIteration, and the loop exits.

So the generator object is sort of like an adapter - at one end it exhibits the iterator protocol, by exposing __iter__() and next() methods to keep the for loop happy. At the other end however, it runs the function just enough to get the next value out of it, and puts it back in suspended mode.

Why Use Generators?

Usually you can write code that doesn't use generators but implements the same logic. One option is to use the temporary list 'trick' I mentioned before. That will not work in all cases, for e.g. if you have infinite loops, or it may make inefficient use of memory when you have a really long list. The other approach is to implement a new iterable class SomethingIter that keeps state in instance members and performs the next logical step in it's next() method. Depending on the logic, the code inside the next() method may end up looking very complex and be prone to bugs. Here generators provide a clean and easy solution.