For those who know wget
, it has a option --spider
, which allows one to check whether a link is broke or not, without actually downloading the webpage. I would like to do the same thing in Python. My problem is that I have a list of 100'000 links I want to check, at most once a day, and at least once a week. In any case this will generate a lot of unnecessary traffic.
As far as I understand from the urllib2.urlopen()
documentation, it does not download the page but only the meta-information. Is this correct? Or is there some other way to do this in a nice manner?
Best,
Troels