hi all, I was condering when I use urllib2.urlopen() does it just to header reads or does it actually bring back the entire webpage?
IE does the HTML page actually get fetch on the urlopen call or the read() call?
handle = urllib2.urlopen(url)
html = handle.read()
The reason I ask is for this workflow...
- I have a list of urls (some of them with short url services)
- I only want to read the webpage if I haven't seen that url before
- I need to call urlopen() and use geturl() to get the final page that link goes to (after the 302 redirects) so I know if I've crawled it yet or not.
- I don't want to incur the overhead of having to grab the html if I've already parsed that page.
thanks!