views:

73

answers:

2
import httplib
def httpCode(theurl):
    if theurl.startswith("http://"): theurl = theurl[7:]
    head = theurl[:theurl.find('/')]
    tail = theurl[theurl.find('/'):]
    response_code = 0
    conn = httplib.HTTPConnection(head)
    conn.request("HEAD",tail)
    res = conn.getresponse()
    response_code = int(res.status)
    return response_code

Basically, this function will take a URL and return its HTTP code (200, 404, etc) The error I got was:

Exception Value:  (-2, 'Name or service not known')

I must do it with this method. That is, I am usually passing in large video files. I need to get the "header" and get the HTTP code. I cannot download the file and then get the HTTP code, because it would take too long.

Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import httplib
>>> def httpCode(theurl):
...     if theurl.startswith("http://"): theurl = theurl[7:]
...     head = theurl[:theurl.find('/')]
...     tail = theurl[theurl.find('/'):]
...     response_code = 0
...     conn = httplib.HTTPConnection(head)
...     conn.request("HEAD",tail)
...     res = conn.getresponse()
...     response_code = int(res.status)
...     print response_code
...
>>> httpCode('http://youtube.com')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in httpCode
  File "/usr/lib/python2.6/httplib.py", line 874, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.6/httplib.py", line 911, in _send_request
    self.endheaders()
  File "/usr/lib/python2.6/httplib.py", line 868, in endheaders
    self._send_output()
  File "/usr/lib/python2.6/httplib.py", line 740, in _send_output
    self.send(msg)
  File "/usr/lib/python2.6/httplib.py", line 699, in send
    self.connect()
  File "/usr/lib/python2.6/httplib.py", line 683, in connect
    self.timeout)
  File "/usr/lib/python2.6/socket.py", line 498, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known
>>>
+1  A: 

Your code worked for me, and for one other person who commented. This implies that the URL you're using is causing a problem with your parsing somehow. head and tail should both be examined in order to determine what it thinks the host is. For example:

head = theurl[:theurl.find('/')]
print head
tail = theurl[theurl.find('/'):]
print tail

Once you can see what head and tail are, you can determine if it really should be able to resolve head. For example, what if the url was:

http://myhost.com:8080/blah/blah

It would fail because of the port number.

Kaleb Pederson
Can you check my new edited post? I did it exactly like 'http://youtube.com'
TIMEX
Or if you left off a trailing slash, like http://google.com since find would return -1. Head would be google.co and tail would be m.
sberry2A
There's your problem, you need a trailing slash.
sberry2A
+1  A: 

As suggested in comment by Adam Crossland, you should be checking your head and tail values. In your case, without a trailing slash you end up with

head = "youtube.co"
tail = "m"

string.find returns -1 if it is not found, hence you are grabbing all but last character for head and only the last character for tail.

sberry2A
Alex, you should strongly consider using urlparse. It is going to do a better job of dealing with URLs: http://docs.python.org/library/urlparse.html
Adam Crossland