tags:

views:

374

answers:

5

Suppose you have a thumbnail generator script that accepts source images in the form of a URL. Is there a way to detect if the source URL is "broken" - whether nonexistent or leads to an non-image file?


Just brute force using getimagesize() or another PHP GD function is not a solution, since spoofed stray URL's that might not be images at all (http://example.com/malicious.exe or the same file, but renamed as http://example.com/malicious.jpg) could be input - such cases could easily be detected by PHP before having to invoke GD. I'm looking for GD pre-sanitizing before having GD try its battalion at parsing the file.


as a first step, the following regular expression checks if the URL is an image extension: preg_match('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)([^\s]+(\.(?i)(jpg|png|gif|bmp))$)@', $txt,$url);

+5  A: 

use file_exists function in php, you can check urls with it.

See documentation below, shows how to check img... exactly what you need

FILE EXISTS - http://www.php.net/manual/en/function.file-exists.php#93572

URL EXISTS - http://www.php.net/manual/en/function.file-exists.php#85246


Here is alternative code for checking the url. If you will test in browser replace \n with <br/>

<?php

$urls = array('http://www.google.com/images/logos/ps_logo2.png', 'http://www.google.com/images/logos/ps_logo2_not_exists.png');

foreach($urls as $url){
   echo "$url - ";
   echo url_exists($url) ? "Exists" : 'Not Exists';
   echo "\n\n";
}


function url_exists($url) {
    $hdrs = @get_headers($url);

    echo @$hdrs[1]."\n";

    return is_array($hdrs) ? preg_match('/^HTTP\\/\\d+\\.\\d+\\s+2\\d\\d\\s+.*$/',$hdrs[0]) : false;
}
?>

Output is as follows

http://www.google.com/images/logos/ps_logo2.png - Content-Type: image/png
Exists

http://www.google.com/images/logos/ps_logo2_not_exists.png - Content-Type: text/html; charset=UTF-8
Not Exists
Alex
I use this all the time for detecting includes in my home-brew MVC setup. Works great and you can easily point to a default not-found image if the file doesn't exist.
smdrager
1) `file_exists` can only check for local files (?) 2) also.. url_exists looks like it is just invoking `curl_exec` -- does `curl_exec` return false if the $url is "broken" or (more specifically) is the wrong header type?
ina
@ina see modification to my comment. I added url_exists method and example. Hope this helps you.
Alex
get_headers would be loing, use CURL with Multi Init
RobertPitt
A: 

You could check the HTTP status code (it should be 200) and the Content-type header (image/png etc.) of the HTTP response before you put the actual image through the generator.

If these two preconditions are ok, after retrieving the image you can call getimagesize() on it and see if it breaks, what MIME type it returns etc.

Alex Ciminian
is there a way to check for files that fake image content headers?
ina
I don't think you can, without retrieving it. See my update :).
Alex Ciminian
A: 
try for local files

<?php 
if(file_exits($filename))
{
//do what you want
}
else
{
//give error that file does not exists
}
?>

for external domains

$headers = @get_headers($url);
if (preg_match("|200|", $headers[0])) {
// file exists
} else {
// file doesn't exist
}

Also you can use curl request for the same.

Yogesh
this only works for local files - what if it's pointing to an external domain `http://anotherdomain.com/image.jpg`
ina
@ina, i forgot to add below code.Thanks for pointing the same.
Yogesh
A: 

did you try file_get_contents() method?

http://php.net/manual/en/function.file-get-contents.php

pMan
this will create unnecessary data transfered for checking if file exists. better just to get a header, less work for server.
Alex
@Alex: Given the OP's determination to detect faked MIME headers, there is no way *without* file_get_contents. +1 to even out the score
Pekka
+1  A: 

The only really reliable way is to request the image using file_get_contents(), and finding out its image type using getimagesize().

Only if getimagesize() returns a valid file type, can you rely that it is in fact a valid image.

This is quite resource heavy, though.

You could consider not doing any server-side checks at all, and adding an onerror JavaScript event to the finished image resource:

<img src="..." onerror="this.style.display = 'none'">
Pekka
what if the file is not an image, i.e, renamed text file or other file?
ina
it's all in vain. he's refused to understand
Col. Shrapnel
@ina in that case, `getimagesize` will fail and you will know it is not an image.
Pekka
you're assuming too much of the php gd installation. on the current server, it will try to parse the file regardless, even if it's an .exe file or something - this is extra parsing in invoking gd when you can first check headers or extension to weed out such strays.
ina
also, why use `file_get_contents()` when `curl()` has caching built in?
ina
@ina you have no idea what you are talking about. `getimagesize()` has nothing to do with GD. It checks the *file format headers* (i.e. the first few bytes of the file) which is as good as a GD check, but without the overhead. It is the only waterproof and performance conscious way to parse an image file, take it or leave it. Regarding `curl` - of course you can use that instead of file_get_contents(), but I don't really see the point, especially about caching. What good is caching when you are trying to validate a resource?
Pekka