views:

46

answers:

4

This URL takes you to an image, but has no file extension to check a regex against:

http://www.tonymooreillustration.com/gallery/main.php?g2_view=core.DownloadItem&g2_itemId=393

I'm using Restclient (HTTP and REST client for Ruby) in my app, so I tried doing this:

RestClient.get "http://www.tonymooreillustration.com/gallery/main.php?g2_view=core.DownloadItem&g2_itemId=393"

I get back lots of text that begins like this:

"\377???JFIF\000\001\002\001\000H\000H\000\000\377?cExif\000\000MM\000*\000\000\000\b\000\a\001\022\000\003\000\000\000\001\000\001\000\000\001\032\000\005\000\000\000\001\000\000\000b\001\e\000\005\000\000\000\001\000\000\000j\001(\000\003\000\000\000\001\000\002\000\000\0011\000\002\000\000\000\024\000\000\000r\0012\000\002\000\000\000\024\000\000\000\206\207i\000\004\000\000\000\001\000\000\000\234\000\000\000?\000\000H\000\000\000\001\000\000\000H\000\000\000\001Adobe Photoshop 7.0\0002005:07:12 02:58:19\000\000\000\000\003\240\001\000\003\000\000\000\001\377\377\000\000\240\002\000\004\000\000\000\001\000\000\001?\000\004\000\000\000\001\000\000\002?\000\000\000\000\000\006\001\003\000\003\000\000\000

Is there a way I can use this to determine if the URL is pointing at an image?

+2  A: 

You could do a HEAD request and check the header for MIME information.

See: http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html#M000682

The response you get in your example is the image itself. You also try do determine wether or not this is a picture by using a utility like file [1] or with image library like imagemagick [2].

[1] http://unixhelp.ed.ac.uk/CGI/man-cgi?file [2] http://rmagick.rubyforge.org/

reto
The text returned is the binary data of the image, to be written to file/db. You should check the mime-type in the header information as suggested here. If you need to pull down the file contents as well, you could check the headers in the same request without needing a separate HEAD request.
Jeremy
+1  A: 

It looks like the REST Client response wraps Ruby's Net::HTTPResponse so if res is the result from RestClient.get you should be able to do:

res.net_http_res.header['content-type']

and see if this starts with image/ e.g. image/jpeg for a JPEG image.

If you don't actually need a copy of the image and just need to check what the URL points to then you are better to do a HEAD request as reto suggests. (this avoids receiving an unnecessary copy of the body content.)

mikej
A: 

I did this about 5 years ago in php. Sadly I don't have the code any more. Basically I used curl with an option to follow all redirects. That way the data that was being returned to the program was nearly always what I actually wanted to test. From that point, I would only grab the first few bytes of data from the content and check if the image meta data existed and whether or not it was jpg, png, or gif. Having the code to show you would probably help to explain this a lot better, but its gone. I likened this to "tasting" a file before eating it.

The advantage of using this kind of technique is that you're actually checking the file instead of relying on header info, which is usually correct but not always.

Geuis
+1  A: 

Your best bet is the Content-Type header, but if all else fails you can derive the image format from the initial set of bytes:

  • JPG: 0xFF 0xD8
  • PNG: 0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A
  • GIF: 'G' 'I' 'F'

Search for <format> file format, replacing <format> with the other file formats you need to identify.

Johannes Gorset