views:

58

answers:

3

Here's a simple python function that checks if a given url is valid:

from httplib import HTTP
from urlparse import urlparse

def checkURL(url):
    p = urlparse(url)
    h = HTTP(p[1])
    h.putrequest('HEAD', p[2])
    h.endheaders()
    if h.getreply()[0] == 200:
        return 1
    else: return 0

This works for most sites, but with my Django-based site I get 200 status code even when I enter a url that is clearly wrong. If I view the same page in a browser, I get a 404. For example, the following page gives a 404 in a browser: http://wefoundland.com/GooseBumper

But gives a 200 when checked with this script. Why?

Edit: While mopoke's answer solved the issue from the Django side of things, there was also a bug in the script above:

instead of parsing the url and then using

 h.putrequest('HEAD', p[2])

I actually needed to use the url in the request, like so:

h.putrequest('HEAD', url)

that solved the issue.

A: 

Your page isn't actually returning a 404 status code:

alex@alex-laptop:~$ curl -I http://wefoundland.com/GooseBumper
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 30 Dec 2009 01:37:41 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Alex Gaynor
+1  A: 

Although the content says 404, the site is returning 200 OK in the headers:

HTTP/1.1 200 OK
Server: nginx
Date: Wed, 30 Dec 2009 01:38:24 GMT
Content-Type: text/html; charset=utf-8
Connection: close

Make sure your response is using HttpResponseNotFound. e.g.:

    return HttpResponseNotFound('<h1>Page not found</h1>')
mopoke
I was using render_to_response with a custom template, so I changed it to the following: t = get_template('404.html') c = Context(my_context) html = t.render(c) return HttpResponseNotFound(html)
Goose Bumper
A: 

To get a 404 to be returned by your Django view, use HttpResponseNotFound instead of HttpResponse, or pass in 'status=404' to the HttpResponse constructor.

spookylukey