tags:

views:

341

answers:

5

What PHP script technique runs the fastest in detecting if a remote image does not exist before I include the image? I mean, I don't want to download all the bytes of the remote image -- just enough to detect if it exists.

And while on the subject but with just a slight deviation, I'd like to download just enough bytes to determine a JPEG's width and height information.

Speed is very important in my concern here on this system design I'm working on.

+2  A: 

Run a cURL that does a HEAD request insted of a full GET

I didn't test this, but hopefully you'll get the idea:

<?php
$url = 'http://www.example.com/image.gif';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_NOBODY, 0); // this is what sets it as HEAD request
curl_exec($ch);

if (curl_getinfo($ch, CURLINFO_HTTP_CODE) == '200') { // 200 = OK
    // image exists ..
}

curl_close($ch);
?>

See cURL docuentation for more information about cURL.

Matt
This is awesome. It's a start. However, I might be able to resolve both my problems in one fell swoop with fsockopen/fread if I can understand PNG and JPG image headers, not needing this curl technique. However, +1 for def. showing me something interesting I did not know.
Volomike
+2  A: 

You should be able to determine a JPEG's dimensions without loading up its entire contents. For baseline JPEGs, that is, non-progressive-scan JPEGs, scan in bytes until you come across 0xFFC0. Skip the next three bytes. The next two bytes indicate the height. They are followed by two more bytes that indicate the width.

For example, in "FF C0 00 11 08 01 DE 02 D0", 01DE represents a height of 478 and 02D0 represents a width of 720.

amoebob
+1  A: 

I'd send a GET request that contains a RANGE header to limit the actual data transfer where possible (the remote server might not honour the RANGE request but it's still worth a try). It probably doesn't make much difference whether you use sockets (directly) or curl to make the requests. But... you never know without benchmarks. For curl take a look at the "CURLOPT_RANGE" option at http://docs.php.net/function.curl-setopt

It probably doesn't fit your profile ("several an hour, on a server with only slim CPU power available.") but you might want to try handling multiple urls at a time, i.e. having multiple active connections and only handle those that won't block on a read operation. If the limiting factor is mostly/only cpu power ...forget this part. sockets: Take a look at stream_select curl: see curl_multi_exec()

If the curl module is unavailable you can also use the http url wrapper in combination with stream_context_create() to send a request containing a RANGE header.

Looks like you've already figured out what to do with the data once you've received it.

VolkerK
A: 

Store images locally. That's very simple and guaranteed solution.

Col. Shrapnel
I have limited server CPU and disk space in this case. Basically need hundreds of blogs to have remote images that work.
Volomike
Greed is one of the deadly sins, @Volomike If you can't afford image storage - just don't go for hundreds of blogs. Make one.
Col. Shrapnel
The images are from public domain sources like Wikipedia, Flickr, and other CreativeCommons sources. And I'm doing this for my client at their request, on their overloaded server that we're trying to reduce bottlenecks on with these blogs.
Volomike
actually you co *create* another bottleneck, hehe. well, greed note goes to him.
Col. Shrapnel
I guess I could create a script to snag the image URL *and* click a site's ads, to kind of pay penance?
Volomike
A: 

I think the following routine will retrieve just the image heights for JPG, GIF, and PNG, or return an === FALSE condition on a 404 or other image type. The routine also does this with the least server resources because the file_get_contents() route appears to actually download the file even with byte restriction added in, as does getimagesize() download the file. You can see the performance hit compared to this.

The way this routine works is that it downloads just 300 bytes from the file. Unfortunately JPEG pushes its height value pretty far out in a file unlike GIF or PNG and so I had to read the file that far out in bytes. Then, with those bytes, it scans for JFIF, PNG, or GIF in that header to let us know which file type it is. Once we have that, we then use unique routines on each to parse the header. Note that JPEG must first use unpack() with H* and then scan for ffc2 or ffc0 and process. GIF, however, must first unpack() with h* (big difference there).

This function was created by me with trial and error, and could be wrong. I ran it on several images and it appears to work good. If you find a fault in it, consider letting me know.

Anyway, this system will let me determine an image height and discard the image and find another if too tall. On whatever random image I find, I set width in the IMG tag of the HTML and it automatically resizes the height -- but looks good only if the image is under a certain height. As well, it does a 404 check to see if the image that was returned by another server to me was not for an image that no longer exists or which prohibits cross-site linking. And since I am manually setting the images to a fixed width, I don't care to read the image width. You can adapt this function and usually look just a few small bytes forward to find image widths should you want to do so.

function getImageHeight($sURL) {
  try {
    $hSock = @ fopen($sURL, 'rb');
    if ($hSock) {
      while(!feof($hSock)) {
        $vData = fread($hSock, 300);
        break;
      }
      fclose($hSock);
      if (strpos(' ' . $vData, 'JFIF')>0) {
        $vData = substr($vData, 0, 300);
        $asResult = unpack('H*',$vData);
        $sBytes = $asResult[1];
        if (strstr($sBytes, 'ffc2')) {
          $sBytes = substr($sBytes, strpos($sBytes, 'ffc2') + 10, 4);
        } else {
          $sBytes = substr($sBytes, strpos($sBytes, 'ffc0') + 10, 4);
        } 
        return hexdec($sBytes);
      } elseif (strpos(' ' . $vData, 'GIF')>0) {
        $vData = substr($vData, 0, 300);
        $asResult = unpack('h*',$vData);
        $sBytes = $asResult[1];
        $sBytes = substr($sBytes, 16, 4);
        $sBytes = strrev($sBytes);
        return hexdec($sBytes);
      } elseif (strpos(' ' . $vData, 'PNG')>0) {
        $vData = substr($vData, 22, 4);
        $asResult = unpack('n',$vData);
        $nHeight = $asResult[1];
        return $nHeight;
      }
    }
  } catch (Exception $e) {}
  return FALSE;
}
Volomike