views:

629

answers:

3

Hi guys!

Here is the thing. Right now I have this e-commerce web site where people can send a lot of pictures for their products. All the images are stored at Amazon's S3. When we need a thumbnail or something, I check over S3 if there is one available. If not, I process one and send it to S3 and display it on the browser. Every different sized thumbnail gets stored at S3, and checking the thumbnail availability at every request is kind of money consuming. I'm afraid I'll pay a lot once the site starts to get more attention (if it gets...).

Thinking about alternatives, I was thinking on keeping only the original images at S3 and process the images on the fly at every request. I imagine that in that way I would by on CPU usage, but I hasn't made any benchmarks to see how far can I go. The thing is that I wouldn't expend money making requests and storing more images on S3 and I could cache everything on the user's browser. I know it's not that safe to do that, so that is why I'm bringing this question here.

What do you think? How do you think I could solve this? Thanks in advanced guys! Bye!

+1  A: 

Keep a local cache of:

  1. Which images are in S3
  2. A cache of the most popular images

Then in both circumstances you have a local reference. If the image isn't in the local cache, you can check a local cache to see if it is in S3. Saves on S3 traffic for your most popular items and saves on latency when checking S3 for an item not in the local cache.

Aiden Bell
unfortunately, keeping a local reference of every image is not an option for me... maybe caching the most used would help, but I would have to limit that to something...
Eber Freitas Dias
so, I've limited the image cache to 1Gb... it's been enough so far! tks :)
Eber Freitas Dias
+2  A: 

I would resize at the time of upload and store all version in S3.

For example if you have a larger image ( 1200x1200 ~200kb ) and create 3 resized version ( 300x300, 120x120, and 60x60 ) you only add about 16% or 32kb ( for my test image, YMMV ). Lets say you need to store a million images; that is roughly 30 GB more, or $4.5 extra a month. Flickr reported to have 2 billion images ( in 2007 ) that is ~$9k extra a month, not too bad if you are that big.

Another major advantage is you will be able to use Amazon's CloudFront.

Ambirex
Create multiple versions of an image and upload that to S3 is very time consuming for the user. Maybe a could create a queue that would do that on the server side instead... It's a good alternative!
Eber Freitas Dias
I have a similar set up, but I don't use PHP to resize, I call out to Imageamgick's convert. Since my images are likely to be read several times more than they are written; I prefer to increase the speed of the reads at the cost of the write.
Ambirex
+1  A: 

If you're proxying from S3 to your clients (which it sounds like you're doing), consider two optimizations:

  1. At upload time, resize the images at once and upload as a package (tar, XML, whatever)
  2. Cache these image packages on your front end nodes.

The 'image package' will reduce the number of PUT/GET/DELETE operations, which aren't free in S3. If you have 4 image sizes, you'll cut down by 4.

The cache will further reduce S3 traffic, since I figure the work flow is usually see a thumbnail -> click it for a larger image.

On top of that, you can implement a 'hot images' cache that is actively pushed to your web nodes so it's pre-cached if you're using a cluster.

Also, I don't recommend using Slicehost<->S3. The transit costs are going to kill you. You should really use EC2 to save a ton of bandwidth(Money!!).

If you aren't proxying, but handing your clients S3 URL's for the images, you'll definitely want to preprocess all of your images. Then you don't have to check for them, but just pass the URL's to your client.

Re-processing the images every time is costly. You'll find that if you can assume that all images are resized, the amount of effort on your web nodes goes down and everything will speed up. This is especially true since you aren't firing off multiple S3 requests.

Gary Richardson
I have no idea on how to upload packages... any resources? thanks :)
Eber Freitas Dias
Same as uploading any files (images, etc).. I'm just suggesting you pack the multiple versions of the same image into a single file to reduce gets.
Gary Richardson