ansaurus

Question

Download from EXPLOSM.net Comics Script [Python]

Answer 1

A:

I suggest using BeautifulSoup to do the parsing, it would simplifly your code a lot.

But since you already got it working this way maybe you won't want to touch it until it breaks (page format changes).

nosklo 2008-12-27 13:54:56

Answer 2

+7 A:

I would suggest using Scrapy for your page fetching and Beautiful Soup for the parsing. This would make your code a lot simpler.

Whether you want to change your existing code that works to these alternatives is up to you. If not, then regular expressions would probably simplify your code somewhat. I'm not sure what effect it would have on performance.

Mat 2008-12-27 14:04:48

Answer 3

+3 A:

refactormycode may be a more appropriate web site for these "let's improve this code" type of discussions.

hayalci 2008-12-27 14:53:42

Answer 4

A:

urllib2 uses blocking calls, and that's the main reason for performance. You should use a non-blocking library (like scrapy) or use multiple threads for the retrieval. I have never used scrapy (so I can't tell on that option), but threading in python is really easy and straightforward.

Roberto Liffredo 2008-12-27 21:56:24

ansaurus

tags:

views:

answers:

Download from EXPLOSM.net Comics Script [Python]

related questions