views:

348

answers:

3

I Have run into a few examples of managing threads with the threading module (using Python 2.6).

What I am trying to understand is how is this example calling the "run" method and where. I do not see it anywhere. The ThreadUrl class gets instantiated in the main() function as "t" and this is where I would normally expect the code to start the "run" method.

Maybe this is not the preferred way of working with threads? Please enlighten me:

#!/usr/bin/env python

import Queue
import time
import urllib2
import threading
import datetime

hosts = ["http://example.com/", "http://www.google.com"]

queue = Queue.Queue()

class ThreadUrl(threading.Thread):
    """Threaded Url Grab"""
    def __init__(self, queue):
            threading.Thread.__init__(self)
            self.queue = queue

    def run(self):
            while True:
                    #grabs host from queue
                    host = self.queue.get()

                    #grabs urls of hosts and prints first 1024 bytes of page
                    url = urllib2.urlopen(host)
                    print url.read(10)

                    #signals to queue job is done
                    self.queue.task_done()

start = time.time()

def main():

    #spawn a pool of threads, and pass them queue instance
    for i in range(1):
            t = ThreadUrl(queue)
            t.setDaemon(True)
            t.start()

            for host in hosts:
                    queue.put(host)

    queue.join()
main()
print "Elapsed time: %s" % (time.time() - start)
+2  A: 

The method run() is called behind the scene by "threading.Thread" (Google inheritance and polymorphism concepts of OOP). The invocation will be done just after t.start() has called.

If you have an access to threading.py (find it in python folder). You will see a class name Thread. In that class, there is a method called "start()". start() called '_start_new_thread(self.__bootstrap, ())' a low-level thread start-up which will run a wrapper method called '__bootstrap()' by a new thread. '__bootstrap()', then, called '__bootstrap_inner()' which do some more preparation before, finally, call 'run()'.

Read the source, you can learn a lot. :D

NawaMan
+1 It's magic. If you're going to use threads try PyQt threads, as they are more optimized and allow signals/slots to communicate across threads.
Chazadanga
A: 

t.start() creates a new thread in the OS and when this thread begins it will call the thread's run() method (or a different function if you provide a target in the Thread constructor)

FogleBird
+4  A: 

Per the pydoc:

Thread.start()

Start the thread’s activity.

It must be called at most once per thread object. It arranges for the object’s run() method to be invoked in a separate thread of control.

This method will raise a RuntimeException if called more than once on the same thread object.

The way to think of python Thread objects is that they take some chunk of python code that is written synchronously (either in the run method or via the target argument) and wrap it up in C code that knows how to make it run asynchronously. The beauty of this is that you get to treat start like an opaque method: you don't have any business overriding it unless you're rewriting the class in C, but you get to treat run very concretely. This can be useful if, for example, you want to test your thread's logic synchronously. All you need is to call t.run() and it will execute just as any other method would.

David Berger
Thanks for the GREAT answer, I failed at looking at the docs.
alfredodeza