views:

425

answers:

6

In python, how would I check if a url ending in .jpg exists?

ex: http://www.fakedomain.com/fakeImage.jpg

thanks

A: 

I think you can try send a http request to the url and read the response.If no exception was caught,it probably exists.

SpawnCxy
that's what I tried doing but I couldn't find any specific code samples. Would you happen to have one?
@user257543 It seems you've got a good one:)
SpawnCxy
+2  A: 

Looks like http://www.fakedomain.com/fakeImage.jpg automatically redirected to http://www.fakedomain.com/index.html without any error.

Redirecting for 301 and 302 responses are automatically done without giving any response back to user.

Please take a look HTTPRedirectHandler, you might need to subclass it to handle that.

Here is the one sample from diveintopython.org

http://diveintopython.org/http_web_services/redirects.html

S.Mark
I think fakedomain.com is used for example as named and actually you needn't to visit it yourself.:-)
SpawnCxy
@SpawnCxy, At first I thought like that, but when I go to that url, fakeImage.jpg does not exist and its redirected to index.html, so I am assuming its more than an example.
S.Mark
+4  A: 
import httplib

conn = httplib.HTTPConnection( 'www.fakedomain.com' )
conn.request( 'HEAD', '/fakeImage.jpg' )
r1 = conn.getresponse()
conn.close()
if r1.status == 200:
    print 'file exists'
else:
    print 'file does not exist'

That works for me, anyway. I should point out that the guts of this program are lifted directly from the Python httplib documentation.

tikiboy
+1, although I'd imagine using `HEAD` instead of `GET` in the call to `conn.request` would be more efficient, since you're only checking to see if it exists.
Daniel Roseman
@Daniel, thanks for that tip. I've updated the code to use HEAD.
tikiboy
A: 

I don't know why you are doing this, but in any case: it should be noted that just because a request to an "image" succeeds, doesn't mean it is what you think it is (it could redirect to anything, or return any data of any type, and potentially cause problems depending on what you do with the response).

Sorry, I went on a binge reading about online exploits and how to defend against them today :P

Carson Myers
A: 

Try it with mechanize:

import mechanize
br = mechanize.Browser()
br.set_handle_redirect(False)
try:
 br.open_novisit('http://www.fakedomain.com/fakeImage.jpg')
 print 'OK'
except:
 print 'KO'
systempuntoout
A: 

thanks for all the responses everyone, ended up using the following:

try:
  f = urllib2.urlopen(urllib2.Request(url))
  deadLinkFound = False
except:
  deadLinkFound = True