If the content type of a page is an image, how reliable is it that the headers will actually indicate that it is an image? Should I use this as the sole method for determining if a page (URL) is an image?
+1
A:
The good news is that I've usually seen images with the wrong Content Type more often than I've seen non-images with the image Content Type. So I would that the "image" Content Type is a pretty good indicator.
I'm curious though. If you're already going through the trouble of requesting a resource, what's the big deal with downloading the whole thing to test if it's an image?
Frank Krueger
2009-12-21 05:15:18
Well that's not very good either. That means I'd be rejecting images because they've set the content-type incorrectly. I *am* downloading the whole thing either way, but I need to know if I should try parsing it as HTML or interpreting it as an image.
Mark
2009-12-21 05:30:05
@Mark **Most** image decoders can bail out at the first sign of trouble. Just try loading the data as an image. If it works, then it's an image. If it doesn't, then it's something else. (I say **most** because I used to specialize in security of imaging codecs and know first hand that most free image decoders are suseptible to a variety of attacks.)
Frank Krueger
2009-12-22 03:49:12
@Frank: Yeah, I guess that's the best way to go about it then. I'm using C#'s `System.Drawing.Image.FromStream()`; I figure that should be pretty robust. Thanks!
Mark
2009-12-24 23:23:54
@Mark Yeah it's "pretty" robust. :-) There are a few sites you shouldn't expose it to though.
Frank Krueger
2009-12-25 00:00:41
@Frank: Like what? And why not?
Mark
2010-01-02 01:48:28