views:

82

answers:

2

I wrote a scraper using python a while back, and it worked fine in the command line. I have made a GUI for the application now, but I am having trouble with one issue. When I attempt to update text inside the gui (e.g. 'fetching URL 12/50'), I am unable seeing as the function within the scraper is grabbing 100+ links. Also when going from one scraping function, to a function that should update the gui, to another function, the gui update function seems to be skipped over while the next scrape function is run. An example would be:

scrapeLinksA() #takes 20 seconds
updateInfo("LinksA done")
scrapeLinksB() #takes another 20 seconds

in the above example, updateInfo is never executed, unless I end the program with a KeyboardInterrupt.

I'm thinking my solution is threading, but I'm not sure. What can I do to fix this?

I am using:

  • PyQt4
  • urllib2
  • BeautifulSoup
+2  A: 

I'd suggest to use QNetworkAccessManager for a non-blocking way of downloading the websites. It's a different approach, so you will probably rewrite the handling part of your application. Instead of waiting until the page is downloaded so that you can parse it, you have multiple smaller functions, connected via signals and they are executed when some events happen (e.g. "the page is downloaded").

Lukáš Lalinský
+2  A: 

Lukáš Lalinský 's answer is very good.

Another possibility would be to use the PyQt threads.

If the problem is merely the 'updating' part (and not the need for asynchronous processing), try putting this call:

QCoreApplication.processEvents()

between scrapeLinksA and scrapeLinksB to see if that helps (it temporarily interrupts the main event loop to see if there are other (paint requests e.g.) pending).

If that doesn't, please provide us with the source of updateInfo.

ChristopheD