views:

45

answers:

2

I am trying to build a script that retrieves a list of thumbnail images from an external link, much like Facebook does when you share a link and can choose the thumbnail image that is associated with that post.

My script currently works like this:

  • file_get_contents on the URL
  • preg_match_all to match any <img src="" in the contents
  • Works out the full URL to each image and stores it in an array
  • If there are < 10 images it loops through and uses getimagesize to find width and height
  • If there are > 10 images it loops through and uses fread and imagecreatefromstring to find width and height (for speed)
  • Once all width and heights are worked out it loops through and only adds the images to a new array that have a minimum width and height (so only larger images are shown, smaller images are less likely to be descriptive of the URL)
  • Each image has its new dimensions worked out (scaled down proportionally) and are returned...

<img src="'.$image[0].'" width="'.$image[1].'" height="'.$image[2].'"><br><br>

At the moment this works fine, but there are a number of problems I can potentially have:

  1. SPEED! If the URL has many images on the page it will take considerably longer to process
  2. MEMORY! Using getimagesize or fread & imagecreatefromstring will store the whole image in memory, any large images on the page could eat up the server's memory and kill my script (and server)

One solution I have found is being able to retrieve the image width and height from the header of the image without having to download the whole image, though I have only found some code to do this for JPG's (it would need to support GIF & PNG).

Can anyone make any suggestions to help me with either problem mentioned above, or perhaps you can suggest another way of doing this I am open to ideas... Thanks!

** Edit: Code below:

// Example images array
$images = array('http://blah.com/1.jpg', 'http://blah.com/2.jpg');

// Find the image sizes
$image_sizes = $this->image_sizes($images);

// Find the images that meet the minimum size
for ($i = 0; $i < count($image_sizes); $i++) {
    if ($image_sizes[$i][0] >= $min || $image_sizes[$i][1] >= $min) {                
        // Scale down the original image size
        $dimensions = $this->resize_dimensions($scale_width, $scale_height, $image_sizes[$i][0], $image_sizes[$i][1]);
        $img[] = array($images[$i], $dimensions['width'], $dimensions['height']);
    }
}

// Output the images
foreach ($img as $image) echo '<img src="'.$image[0].'" width="'.$image[1].'" height="'.$image[2].'"><br><br>';

/**
 * Retrieves the image sizes
 * Uses the getimagesize() function or the filesystem for speed increases
 */
public function image_sizes($images) {
    $out = array();
    if (count($images) < 10) {
        foreach ($images as $image) {
            list($width, $height) = @getimagesize($image);
            if (is_numeric($width) && is_numeric($height)) {
                $out[] = array($width, $height);
            }
            else {
                $out[] = array(0, 0);
            }
        }
    }
    else {
        foreach ($images as $image) {
            $handle = @fopen($image, "rb");
            $contents = "";
            if ($handle) {
                while(true) {
                    $data = fread($handle, 8192);
                    if (strlen($data) == 0) break;
                    $contents .= $data;
                }
                fclose($handle);
                $im = @imagecreatefromstring($contents);
                if ($im) {
                    $out[] = array(imagesx($im), imagesy($im));
                }
                else {
                    $out[] = array(0, 0);
                }
                @imagedestroy($im);
            }
            else {
                $out[] = array(0, 0);
            }
        }
    }
    return $out;
}

/**
 * Calculates restricted dimensions with a maximum of $goal_width by $goal_height 
 */
public function resize_dimensions($goal_width, $goal_height, $width, $height) {
    $return = array('width' => $width, 'height' => $height);

    // If the ratio > goal ratio and the width > goal width resize down to goal width
    if ($width/$height > $goal_width/$goal_height && $width > $goal_width) {
        $return['width'] = floor($goal_width);
        $return['height'] = floor($goal_width/$width * $height);
    }

    // Otherwise, if the height > goal, resize down to goal height
    else if ($height > $goal_height) {
        $return['width'] = floor($goal_height/$height * $width);
        $return['height'] = floor($goal_height);
    }   
    return $return;
}
A: 

The only idea that comes to mind for your current approach (which is impressive) is to check the HTML for existing width and height attributes and skip the file read process altogether if they exist.

Cryo
yes I have considered that but most sites dont bother with the width and height attributes, also you could have a 1px x 10px image and stretch the width to 800 px to create a line (old school I know but people still do it) which my script would treat as a valid image that meets the minimum width requirements!
fire
Very true. Perhaps this method would allow you to weed out the images that don't meet the minimum requirements at the least. I'm stumped right now for other ideas, +1 to hopefully get some traction.
Cryo
+1  A: 

getimagesize reads only header, but imagecreatefromstring reads whole image. Image read by GD, ImageMagick or GraphicsMagic is stored as bitmap so it consumes width*height*(3 or 4) bytes, and there's nothing you can do about it. The best possible solution for your problem is to make curl multi-request (see http://ru.php.net/manual/en/function.curl-multi-select.php ), and then one by one process recieved images with GD or any other library. And to make memory consumption a bit lower, you can store image files on disk, not in memory.

poiuyttr
thanks I'll give it a go..
fire