views:

170

answers:

2

I'm looping through an array of URL strings of images hosted at an external site.

It looks something like this:

def get_image_urls
  image_url_array.each do |image_url|
    puts image_tag image_url
  end
end

Which will return the URLs of images hosted on the external site. The problem is, some of these images might be broken (404). So for example:

get_image_urls
# These would return image_tags, but for brevity...
=> "http://someothersite.com/images/1.jpg"
   "http://someothersite.com/images/2.jpg"
   "http://someothersite.com/images/3.jpg" # <-- (Broken: 404)
   "http://someothersite.com/images/4.jpg"
   "http://someothersite.com/images/5.jpg" # <-- (Broken: 404)

What I'm looking to do is replace the URL strings of the broken images to a "missing" image hosted on my own site. So using the example above, with 3.jpg and 5.jpg being broken, I want to have returned something like this:

get_image_urls
# These would return image_tags, but for brevity...
=> "http://someothersite.com/images/1.jpg"
   "http://someothersite.com/images/2.jpg"
   "http://mysite.com/images/missing.png"
   "http://someothersite.com/images/4.jpg"
   "http://mysite.com/images/missing.png"

Is there a simple way to solve this problem? Thanks so much in advance.

+2  A: 

Can't you do a simple request for the image and check if its a 404? Its not perfect but would catch the bulk. Depends how often you are running it. If you don't have access to the files on the server directly to check then a HTTP request is the only way. For speed you could do a header only request. Will need a code example, let me dig you one out...

Depends on what your server will return, if you can get the headers and you just get a standard 404 page then you can check the content-length to ensure its not big enough for an image, that sounds a bit hacky but would work (theres a better way afterwards). Something like:

(take and modified from http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html#M000682).

response = nil
Net::HTTP.start('www.example.com', 80) {|http|
  response = http.head('/myimage.html')
}
# Assume anything with a content-length greater than 1000 must be an image? 
# You will need to tweek/tune this to your server and your definition of what a good image is
p "Good image" unless response['content-length'] < 1000

Alternatively you can (and should really to do it the right way) get the HTTP status message as thats the definitive way of the server telling you if the image is there or not. Only trouble is you might have to download the whole thing as I don't know of a quick way of getting just the HTTP status without doing a whole request (see request method in the doc linked above for details).

Hope that helps though.

Pete Duncanson
+3  A: 

I dont think it is possible to check availability of remote images without periodically requests as Pete described.

But may be you find useful trick I used once (with jquery):

$('img').error(function(){
 $(this).attr('src', '<<<REPLACE URL>>>');
});

On error event you can replace host on image url.

Also, you can collect this information by AJAX posts to your host from client and after some amount of such errors - check with Pete method. It will radically decrease amount of needed checkings.

Roman Golomidov
I appreciate the input. Unfortunately this time I need to do this all server-side, but will keep this in mind for the future.
Bryan Woods
Thanks a nice client-side fix Roman. Will be adding that to the tool box, thanks ;)
Pete Duncanson