views:

107

answers:

2

Hi there. In my program I need to download 3-4 files simultaneously (from different servers which are quite slow). I'm aware of the solution involving python threads or qt threads, but I'm wondering: since it seems to be a quite common task, maybe there's a library which I feed with urls and simply receive the files? Thanks in advance!

+1  A: 

You don't really need a library; this is a simple use of threads (well, inasmuch as threads can be 'simple'). See e.g. http://www.artfulcode.net/articles/multi-threading-python/ for a neat tutorial.

katrielalex
+3  A: 

Yes, there is one - pycurl.

Its not 'simply', since curl is low-level, but it does exactly what you need - you provide it some urls and it downloads then simultaneously and asynchronously.

import pycurl
from StringIO import StringIO

def LoadMulti(urls):
    m = pycurl.CurlMulti()
    handles = {}
    for url in urls:
        c = pycurl.Curl()
        c.setopt(pycurl.URL, url)
        data = StringIO()
        header = StringIO()
        c.setopt(pycurl.WRITEFUNCTION, data.write)
        c.setopt(pycurl.HEADERFUNCTION, header.write)                
        handles[url] = dict(data=data, header=header, handle=c)
        m.add_handle(c)
    while 1:
        ret, num_handles = m.perform()
        if ret != pycurl.E_CALL_MULTI_PERFORM: break
    while num_handles:
        ret = m.select(1.0)
        if ret == -1:  continue
        while 1:
            ret, num_handles = m.perform()
            if ret != pycurl.E_CALL_MULTI_PERFORM: break
    return handles


res = LoadMulti(['http://pycurl.sourceforge.net/doc/pycurl.html', 'http://pycurl.sourceforge.net/doc/curlobject.html', 'http://pycurl.sourceforge.net/doc/curlmultiobject.html'])
for url, d in res.iteritems():
    print url, d['handle'].getinfo(pycurl.HTTP_CODE), len(d['data'].getvalue()), len(d['header'].getvalue())

You can run GUI updates in these while loops so interface does not freeze.

Daniel Kluev
awesome answer, thanks!
Tom