views:

123

answers:

3

I have a tkinter GUI that downloads data from multiple websites at once. I run a seperate thread for each download (about 28). Is that too much threads for one GUI process? because it's really slow, each individual page should take about 1 to 2 seconds but when all are run at once it takes over 40 seconds. Is there any way I can shorten the time it takes to download all the pages? Any help is appreciated, thanks.

+2  A: 

It's probably the GIL (global interpreter lock) that gets in your way. Python has some performance problems with many threads.

You could try twisted.web.getPage (see http://twistedmatrix.com/projects/core/documentation/howto/async.html a bit down the page). I don't have benchmarks for that. But taking the example on that page and adding 28 deferreds to see how fast it is will give you a comparable result pretty fast. Keep in mind, that you'd have to use the gtk reactor and get into twisteds programming style, though.

buster
I doubt it: that's why TCP sockets are buffered. I'll have to give it a try out of curiosity, but I'd expect socket buffering to take care of this for the most part.
Glenn Maynard
I read through that page and it looks pretty useful, I might try to use Twisted if I can't find anything simpler, thanks.
Upvote for the twisted recommendation. It's far, far easier to write and debug code that uses twisted than it would be for the equivalent threaded code.
Aaron Gallagher
If you try the twisted way, please post your results here. I'd be interested in the outcome :)
buster
Sure I'll definitely follow up on this however it might take me some time since I don't have the time available right now.
A: 

You can try using processes instead of threads. Python has GIL which might cause some delays in your situation.

freiksenet
+1  A: 

A process can have hundreds of threads on any modern OS without any problem.

If you're bandwidth-limited, 1 to 2 seconds times 28 means 40 seconds is about right. If you're latency limited, it should be faster, but with no information, all I can suggest is:

  • add logging to your code to make sure it's actually running in parallel, and that you're not accidentally serializing your threads somehow;
  • use a network monitor to make sure that network requests are actually going out in parallel.

It's hard to give anything better without more information.

Glenn Maynard
I checked and the threads all start at the same time and run in parallel but they all return from the urllib request at different times. I'm assuming that it has something to do with the network and handling multiple requests at once. Do you think that is the problem or is it something else? Can I have multiple network requests run at once? I'm on windows xp and I have a wireless connection if that helps. Thanks for your help and excuse my ignorance I'm not much of a network person.