Here's a simple python function that checks if a given url is valid:
from httplib import HTTP
from urlparse import urlparse
def checkURL(url):
p = urlparse(url)
h = HTTP(p[1])
h.putrequest('HEAD', p[2])
h.endheaders()
if h.getreply()[0] == 200:
return 1
else: return 0
This works for most sites, but with my Django-based site I get 200 status code even when I enter a url that is clearly wrong. If I view the same page in a browser, I get a 404. For example, the following page gives a 404 in a browser: http://wefoundland.com/GooseBumper
But gives a 200 when checked with this script. Why?
Edit: While mopoke's answer solved the issue from the Django side of things, there was also a bug in the script above:
instead of parsing the url and then using
h.putrequest('HEAD', p[2])
I actually needed to use the url in the request, like so:
h.putrequest('HEAD', url)
that solved the issue.