Hi there. In my program I need to download 3-4 files simultaneously (from different servers which are quite slow). I'm aware of the solution involving python threads or qt threads, but I'm wondering: since it seems to be a quite common task, maybe there's a library which I feed with urls and simply receive the files? Thanks in advance!
+1
A:
You don't really need a library; this is a simple use of threads (well, inasmuch as threads can be 'simple'). See e.g. http://www.artfulcode.net/articles/multi-threading-python/ for a neat tutorial.
katrielalex
2010-08-23 10:08:56
+3
A:
Yes, there is one - pycurl.
Its not 'simply', since curl is low-level, but it does exactly what you need - you provide it some urls and it downloads then simultaneously and asynchronously.
import pycurl
from StringIO import StringIO
def LoadMulti(urls):
m = pycurl.CurlMulti()
handles = {}
for url in urls:
c = pycurl.Curl()
c.setopt(pycurl.URL, url)
data = StringIO()
header = StringIO()
c.setopt(pycurl.WRITEFUNCTION, data.write)
c.setopt(pycurl.HEADERFUNCTION, header.write)
handles[url] = dict(data=data, header=header, handle=c)
m.add_handle(c)
while 1:
ret, num_handles = m.perform()
if ret != pycurl.E_CALL_MULTI_PERFORM: break
while num_handles:
ret = m.select(1.0)
if ret == -1: continue
while 1:
ret, num_handles = m.perform()
if ret != pycurl.E_CALL_MULTI_PERFORM: break
return handles
res = LoadMulti(['http://pycurl.sourceforge.net/doc/pycurl.html', 'http://pycurl.sourceforge.net/doc/curlobject.html', 'http://pycurl.sourceforge.net/doc/curlmultiobject.html'])
for url, d in res.iteritems():
print url, d['handle'].getinfo(pycurl.HTTP_CODE), len(d['data'].getvalue()), len(d['header'].getvalue())
You can run GUI updates in these while loops so interface does not freeze.
Daniel Kluev
2010-08-23 10:55:09
awesome answer, thanks!
Tom
2010-08-23 11:41:13