views:

497

answers:

6

Hello, my company has recently started to get problems with the image handling for our websites.

We have several websites (adult entertainment) that display images like dvd covers, snapshots and similar. We have about 100'000 movies and for each movie we have an average of 30 snapshots + covers. Almost every image has an additional version with blurring and overlay for non-members, this results in about 50 images per movie or a total of 5 million base images. Each of the images is available in several versions, depending on where it's placed on the page (thumbnail, original, small preview, not-so-small preview, small image in the top-list, etc.) which results in more images than i cared to count.

Now i had the idea to use a server to generate the images on-the-fly since it became quite clumsy to generate all the different images for all the different pages (as different pages sometimes even need different image sizes for basically the same task).

Does anyone know of an image processing server that can scale down images on-the-fly so we only need to provide the original images and the web guys can just request whatever size they need?

Requirements:

  • Very High performance (Several thousand users per day)
  • On-the-fly blurring and overlay creation
  • On-the-fly resize (with and without keeping aspect ratio)
  • Can handle millions of images
  • Must be able to read JPG, GIF, PNG and BMP and convert between them

Security is not that much of a concern as i.e. the unblurred images can already be reached by URL manipulation and more security would be nice but it's not required and frankly i stopped caring (After failing to get into my coworkers heads why (for our small reseller page) it's a bad idea to use http://example.com/view_image.php?filename=/data/images/01020304.jpg to display the images).

We tried PHP scripts to do this but the performance was too slow for this many users.

Thanks in advance for any suggestions you have.

+6  A: 

Based on

We tried PHP scripts to do this but the performance was too slow for this many users.

I'm going to assume you weren't caching the results. I'd recommend caching the resulting images for a day or two (i.e. have your script check to see if the thumbnail has already been generated, if so use it, if it hasn't generate it on the fly).

This would improve performance dramatically as I'd imagine the main/start page probably has a lot more hits than random video X, thus when viewing the main page no images have to be created as they're cached. When User Y views Movie X, they won't notice the delay as much since it just has to generate that one page.

For the "On-the-fly resize" aspect - how important is bandwidth to you? I'd want to assume you're going through so much with movies that a few extra kb in images per request wouldn't do too much harm. If that's the case, you could just use larger images and set the width and height and let the browser do the scaling for you.

BarrettJ
Even better than caching them locally... request the images via a CDN and then the CDN will cache all the generated images for you and serve them much faster than you can, with cheaper bandwidth costs. That's how we do it and it's extremely effective.
Greg Beech
+1 I absolutely second this! Why set up an expensive on-the-fly server when what you need can be achieved with household tools.
Pekka
@Greg Beech: it may be hard to find a CDN which will be happy to cache these kind of files.
cherouvim
@cherouvim - I would be surprised if you can't find a CDN happy to cache these kind of files. It's what you pay them for after all. We use Akamai, who do a grand job.
Greg Beech
+2  A: 

The ImageCache and Image Exact Sizes solutions from the Drupal community might do this, and like most solutions OSS use the libraries from ImageMagik

There are some AMI images for Amazons EC2 service to do image scaling. It used Amazon S3 for image storage, original and scales, and could feed them through to Amazons CDN service (Cloud Front). Check on EC2 site for what's available

Another option is Google. Google docs now supports all file types, so you can load the images up to a Google docs folder, and share the folder for public access. The URL's are kind of long e.g.

http://lh6.ggpht.com/VMLEHAa3kSHEoRr7AchhQ6HEzHVTn1b7Mf-whpxmPlpdrRfPW216UhYdQy3pzIe4f8Q7PKXN79AD4eRqu1obC7I

Add the =s paramter to scale the image, cool! e.g. for 200 pixels wide

http://lh6.ggpht.com/VMLEHAa3kSHEoRr7AchhQ6HEzHVTn1b7Mf-whpxmPlpdrRfPW216UhYdQy3pzIe4f8Q7PKXN79AD4eRqu1obC7I=s200

Google only charge USD5/year for 20GB. There is a full API for uploading docs etc

Other answers on SO http://stackoverflow.com/questions/236139/how-best-to-resize-images-off-server/236264

TFD
Thanks for the good suggestion but an external provider is (sadly) not an option (management won't approve it, we already tried to get something similar through)
dbemerlin
+1  A: 

Ok first problem is that resizing an image with any language takes a little processing time. So how do you support thousands of clients? We'll you cache it so you only have to generate the image once. The next time someone asks for that image, check to see if it has already been generated, if it has just return that. If you have multiple app servers then you'll want to cache to a central file-system to increase your cache-hit ratio and reduce the amount of space you will need.

In order to cache properly you need to use a predictable naming convention that takes into account all the different ways that you want your image displayed, i.e. use something like myimage_blurred_320x200.jpg to save a jpeg that has been blurred and resized to 300 width and 200 height, etc.

Another approach is to sit your image server behind a proxy server that way all the caching logic is done automatically for you and your images are served by a fast, native web server.

Your not going to be able to serve millions of resized images any other way. That's how Google and Bing maps do it, they pre-generate all the images they need for the world at different pre-set extents so they can provide adequate performance and be able to return pre-generated static images.

If php is too slow you should consider using the 2D graphic libraries from Java or .NET as they are very rich and can support all your requirements. To get a flavour of the Graphics API here is a method in .NET that will resize any image to the new width or height specified. If you omit a height or width, it will resize maintaining the right aspect ratio. Note Image can be a created from a JPG, GIF, PNG or BMP:

// Creates a re-sized image from the SourceFile provided that retails the same aspect ratio of the SourceImage. 
// -    If either the width or height dimensions is not provided then the resized image will use the 
//      proportion of the provided dimension to calculate the missing one.
// -    If both the width and height are provided then the resized image will have the dimensions provided 
//      with the sides of the excess portions clipped from the center of the image.
public static Image ResizeImage(Image sourceImage, int? newWidth, int? newHeight)
{
    bool doNotScale = newWidth == null || newHeight == null; ;

    if (newWidth == null)
    {
        newWidth = (int)(sourceImage.Width * ((float)newHeight / sourceImage.Height));
    }
    else if (newHeight == null)
    {
        newHeight = (int)(sourceImage.Height * ((float)newWidth) / sourceImage.Width);
    }

    var targetImage = new Bitmap(newWidth.Value, newHeight.Value);

    Rectangle srcRect;
    var desRect = new Rectangle(0, 0, newWidth.Value, newHeight.Value);

    if (doNotScale)
    {
        srcRect = new Rectangle(0, 0, sourceImage.Width, sourceImage.Height);
    }
    else
    {
        if (sourceImage.Height > sourceImage.Width)
        {
            // clip the height
            int delta = sourceImage.Height - sourceImage.Width;
            srcRect = new Rectangle(0, delta / 2, sourceImage.Width, sourceImage.Width);
        }
        else
        {
            // clip the width
            int delta = sourceImage.Width - sourceImage.Height;
            srcRect = new Rectangle(delta / 2, 0, sourceImage.Height, sourceImage.Height);
        }
    }

    using (var g = Graphics.FromImage(targetImage))
    {
        g.SmoothingMode = SmoothingMode.HighQuality;
        g.InterpolationMode = InterpolationMode.HighQualityBicubic;

        g.DrawImage(sourceImage, desRect, srcRect, GraphicsUnit.Pixel);
    }

    return targetImage;
}
mythz
Thanks for the suggestions and the code, i'll benchmark it and check if it could be fast enough (and i will create a PHP benchmark, too, to check if my coworkers were right with "PHP is too slow").
dbemerlin
A: 

If each different image is uniquely identifiable by a single URL then I'd simply use a CDN such as AKAMAI. Let your PHP script do the job and let AKAMAI handle the load.

Since this kind of business doesn't usually have budget problems, that'd be the only place I'd look at.

Edit: that works only if you do find a CDN that will serve this kind of content for you.

cherouvim
I have to discard this solution as the available budget is exactly 0 Euro (or for you US folks: exactly $0). We couldn't even get management to approve installing a test or development server (but that it now takes us twice the time to develop is no problem because developers cost nothing as they are already there...). Still, thanks for your suggestion.
dbemerlin
+2  A: 

I suggest you set up a dedicated web server to handle image resize and serve the final result. I have done something similar, although on a much smaller scale. It basically eliminates the process of checking for the cache.

It works like this:

  • you request the image appending the required size to the filename like http://imageserver/someimage.150x120.jpg
  • if the image exists, it will be returned with no other processing (this is the main point, the cache check is implicit)
  • if the image does not exist, handle the 404 not found via .htaccess and reroute the request to the script that generates the image of the required size
  • in the script specify the list of allowed sizes to avoid attacks like scripts requesting every possible size to shut your server down
  • keep this on a cookieless domain to minimize unnecessary traffic

EDIT: I don't think that PHP itself would slow the process much, as PHP scripting in this case is reduced to a minimum: the image scaling is done by a builtin library written in C. Whatever you do you'll have to use a library like this (GD or libmagick or so) so that's unavoidable. With my system at least you totally skip the overhead of checking the cache, thus further reducing PHP interaction. You can implement this on your existing server, so I guess it's a solution well suited for your budget.

kemp
Good suggestion, i will check if it's possible to implement this without an additional webserver as we won't be able to get one for this.
dbemerlin
It's totally possible, I suggested the dedicated server just to share the load. In any case even on a single server, consider using a different virtual host for the last point I mentioned
kemp
I'm accepting this solution now since it seems to be the solution that scales best with the least overhead, now i just have to get it through to management.
dbemerlin
A: 

I really like the solution suggested by kemp. We are looking into a very similar problem except we have tens of thousands of different websites we serve images for. It looks like amazon might have a similar solution.

giarcJavaDummy