tags:

views:

606

answers:

4

I'm writing a program that downloads information from the web and part of that is images.

At the moment I'm having a problem as the code to download the images is a different part to the code that displays them (under mvc). If a 404 is issued or the image download fails some way the display code pops a message propmt up which i would like to avoid.

Is there an easy way to check to see if an image is valid? I'm only concerned about jpg, gif and png.

Note: I dont care about reading the image data, just to check to see if it is valid image format.

+4  A: 

Do you want to check whether the download would be successful? Or do you want to check that what is downloaded is, in fact, an image?

In the former case, the only way to check is to try to access it and see what kind of HTTP response code you get. You can send an HTTP HEAD request to get the response code without actually downloading the image, but if you're just going to go ahead and download the image anyway (if it's successful) then sending a separate HEAD request seems like a waste of time (and bandwidth).

Alternatively, if you really want to check that what you're downloading is a valid image file, you have to read the whole file to check it for corruption. But if you just want to check that the file extension is accurate, it should be enough to check the first few bytes of the file. All GIF images start with the ASCII text GIF87 or GIF89 depending on which GIF specification is used. PNG images start with the ASCII text PNG, and JPEG images have some magic number, which appears to be 0xd8ffe0ff based on the JPEGs I looked at. (You should do some research and check that, try Wikipedia for links) Keep in mind, though, that to get even the first few bytes of the image, you will need to send an HTTP request which could return a 404 (and in that case you don't have any image to check).

David Zaslavsky
I think that this is a good answer. Its just a shame that using the expected web methods can be ruined by server side scripting misbehaving. Still, the advise in the second paragraph is sound. :)
jheriko
+1  A: 

If you really want to know if an image file is valid, you actually have to decode it (although you don't need to store the bits). This is because the file might be the wrong size or might be corrupted.

If you're using an HTTP library to do the downloads, you should be able to examine the header and know that you're getting a 404 error and not a real payload. Look at the documentation for the library you're using.

If you're getting back a file and you want to see if it's probably an image without fully-decoding it, then you'll need to check at least the headers for validity. libpng and libjpeg offer pretty low-level access to png and jpeg files, respectively. You could also look at higher-level libraries like ImageMagick, Microsoft's MFC, or whatever library is most appropriate for your platform.

Mr Fooz
+1  A: 

Thanks for the answers guys. I have all ready downloaded the file so i went with just checking the magic number as the front end i use (wxWidgets) all ready has image library's and i wanted something very light.

uint8 UTIL_isValidImage(const unsigned char h[5])
{

    //GIF8
    if (h[0] == 71 && h[1] == 73 && h[2] == 70 && h[3] == 56)
     return IMAGE_GIF;

    //89 PNG
    if (h[0] == 137 && h[1] == 80 && h[2] == 78 && h[3] == 71)
     return IMAGE_PNG;

    //FFD8
    if (h[0] == 255 && h[1] == 216)
     return IMAGE_JPG;

    return IMAGE_VOID;
}
Lodle
A: 

When you GET a resource through HTTP, you must use the Content-Type header to determine how to process the content. If you've already downloaded it to a local file, the information that a real web browser relies upon is already lost. In many cases, the URL will match the Content-Type (e.g. http://example.com/image.png is served up as Content-Type: image/png). However, you cannot rely on this.

Tom
The problem is when the web give you an 404 error page instead of the iamge
Lodle
A 404 response that contains any content should still specify a Content-Type if it expects to be renderable.
Tom