ansaurus

Question

If a URL doesn't have a filename in it, can I determine if it is leading to an image?

Answer 1

+2 A:

You could do a HEAD request and check the header for MIME information.

See: http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html#M000682

The response you get in your example is the image itself. You also try do determine wether or not this is a picture by using a utility like file [1] or with image library like imagemagick [2].

[1] http://unixhelp.ed.ac.uk/CGI/man-cgi?file [2] http://rmagick.rubyforge.org/

reto 2010-10-11 07:05:13

The text returned is the binary data of the image, to be written to file/db. You should check the mime-type in the header information as suggested here. If you need to pull down the file contents as well, you could check the headers in the same request without needing a separate HEAD request.

Jeremy 2010-10-11 07:13:09

Answer 2

+1 A:

It looks like the REST Client response wraps Ruby's Net::HTTPResponse so if res is the result from RestClient.get you should be able to do:

res.net_http_res.header['content-type']

and see if this starts with image/ e.g. image/jpeg for a JPEG image.

If you don't actually need a copy of the image and just need to check what the URL points to then you are better to do a HEAD request as reto suggests. (this avoids receiving an unnecessary copy of the body content.)

mikej 2010-10-11 07:11:33

Answer 3

A:

I did this about 5 years ago in php. Sadly I don't have the code any more. Basically I used curl with an option to follow all redirects. That way the data that was being returned to the program was nearly always what I actually wanted to test. From that point, I would only grab the first few bytes of data from the content and check if the image meta data existed and whether or not it was jpg, png, or gif. Having the code to show you would probably help to explain this a lot better, but its gone. I likened this to "tasting" a file before eating it.

The advantage of using this kind of technique is that you're actually checking the file instead of relying on header info, which is usually correct but not always.

Geuis 2010-10-11 07:16:42

Answer 4

+1 A:

Your best bet is the Content-Type header, but if all else fails you can derive the image format from the initial set of bytes:

JPG: 0xFF 0xD8
PNG: 0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A
GIF: 'G' 'I' 'F'

Search for <format> file format, replacing <format> with the other file formats you need to identify.

Johannes Gorset 2010-10-11 07:17:57

ansaurus

tags:

views:

answers:

If a URL doesn't have a filename in it, can I determine if it is leading to an image?

related questions