So I wrote this short script (correct word?) to download the comic images from explosm.net comics because I somewhat-recently found out about it and I want to...put it on my iPhone...3G.
It works fine and all. urllib2 for getting webpage html and urllib for image.retrieve()
Why I posted this on SO: how do I optimize this code? Would REGEX (regular expressions) make it faster? Is it an internet limitation? Poor algorithm...?
Any improvements in speed or general code aesthetics would be greatly appreciated "answers".
Thank you.
--------------------------------CODE----------------------------------
import urllib, urllib2
def LinkConvert(string_link):
for eachLetter in string_link:
if eachLetter == " ":
string_link = string_link[:string_link.find(eachLetter)] + "%20" + string_link[string_link.find(eachLetter)+1:]
return string_link
start = 82
end = 1506
matchingStart = """<img alt="Cyanide and Happiness, a daily webcomic" src="http://www.explosm.net/db/files/Comics/"""
matchingEnd = """></"""
link = "http://www.explosm.net/comics/"
for pageNum in range(start,start+7):
req = urllib2.Request(link+`pageNum`)
response = urllib2.urlopen(req)
page = response.read()
istart1 = page.find(matchingStart)
iend1 = page.find(matchingEnd, istart1)
newString1 = page[istart1 : iend1]
istart2 = newString1.find("src=")+4
iend2 = len(newString1)
final = newString1[istart2 +1 : iend2 -1]
final = LinkConvert(final)
try:
image = urllib.URLopener()
image.retrieve(final, `pageNum` + ".jpg")
except:
print "Uh-oh! " + `pageNum` + " was not downloaded!"
print `pageNum` + " completed..."
By the way, this is Python 2.5 code, not 3.0 but you bet I have all the features of PYthon 3.0 greatly studied and played around with before or right after New Year (after College Apps - YAY! ^-^)