Using getimagesize()
is very slow, especially if you're scraping a site and get many images. PHP has to download the entirety of each image BEFORE it can pass the data to getimagesize()
, so if you're working on (for instance) a large photo gallery, you could be downloading many megabytes per image.
There's a few things you can do to speed up the process:
check the height/width attributes of the <img>
tag and only grab images where either's larger than 50. They might not necessarily be accurate, as the web page creator could be stretching or shrinking the image, but it would save you from downloading accurately sized small images.
Instead of fetching the images directly with getimagesize()
you could try to fetch only the first couple hundred bytes of each, which will contain the image header information. For GIF/JPEG images, the height/width will be very near the beginning on the file, so you'd save on file transfer overhead.
Increase your script's execution time. Fetching all the images will naturally be a fairly slow process, and you'll most likely run up against PHP's max_execution_time
comment followup:
Well, if there's no height/width, then you can jump straight to fetching the image (or first bit of the image) and extracting height/width directly. Checking the height/width in the tag is just to save you the trouble of having to fetch the image in the first place.
As for extracting the height/width from the HTML, it's just a matter of using ->getAttribute('width')
and ->getAttribute('height')
calls once you've found an <img>
tag with the SimpleHTMLDOM. Something like this:
$dom = file_get_html('http://example.com/somepage.html');
$images = $dom->find('img');
foreach($images as $img) {
$h = $img->getAttribute('height');
$w = $img->getAttribute('width');
if (isnull($h) || (isnull($w)) {
// height and/or width not available in tag, so fetch image and get size that way
$h = ...
$w = ...
}
if (($h >= 50) && ($w >= 50)) {
// image is bigger than 50x50, so display it...
}
}
This probably won't work if you cut/paste it, just doing off the top of my head, but it should be enough to get you started.