views:

571

answers:

6

I am writing a GUI app in Pyglet that has to display tens to hundreds of thumbnails from the Internet. Right now, I am using urllib.urlretrieve to grab them, but this blocks each time until they are finished, and only grabs one at a time.

I would prefer to download them in parallel and have each one display as soon as it's finished, without blocking the GUI at any point. What is the best way to do this?

I don't know much about threads, but it looks like the threading module might help? Or perhaps there is some easy way I've overlooked.

+2  A: 

As you suspected, this is a perfect situation for threading. Here is a short guide I found immensely helpful when doing my own first bit of threading in python.

BipedalShark
+2  A: 

As you rightly indicated, you could create a number of threads, each of which is responsible for performing urlretrieve operations. This allows the main thread to continue uninterrupted.

Here is a tutorial on threading in python: http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf

Brandon E Taylor
Thanks for the tutorial, it was helpful.
Kiv
+2  A: 

Here's an example of how to use threading.Thread. Just replace the class name with your own and the run function with your own. Note that threading is great for IO restricted applications like your's and can really speed it up. Using pythong threading strictly for computation in standard python doesn't help because only one thread can compute at a time.

import threading, time
class Ping(threading.Thread):
    def __init__(self, multiple):
     threading.Thread.__init__(self)
     self.multiple = multiple
    def run(self):
     #sleeps 3 seconds then prints 'pong' x times
     time.sleep(3)
     printString = 'pong' * self.multiple

pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now
Sean
+2  A: 

You'll probably benefit from threading or multiprocessing modules. You don't actually need to create all those Thread-based classes by yourself, there is a simpler method using Pool.map:

from multiprocessing import Pool

def fetch_url(url):
    # Fetch the URL contents and save it anywhere you need and
    # return something meaningful (like filename or error code),
    # if you wish.
    ...

pool = Pool(processes=4)
result = pool.map(f, image_url_list)
drdaeman
Is there an equivalent of Pool for threads? This looks like it runs separate processes, which might be heavier than I need.
Kiv
Seems that there is no built-in one, but I've found this: http://www.chrisarndt.de/projects/threadpool/ and it seems quite similar.
drdaeman
Thanks, threadpool is working great.
Kiv
A: 

You either need to use threads, or an asynchronous networking library such as Twisted. I suspect that using threads might be simpler in your particular use case.

Marius Gedminas
+1  A: 

You have these choices:

  • Threads: easiest but doesn't scale well
  • Twisted: medium difficulty, scales well but shares CPU due to GIL and being single threaded.
  • Multiprocessing: hardest. Scales well if you know how to write your own event loop.

I recommend just using threads unless you need an industrial scale fetcher.

Unknown