views:

533

answers:

7

This is about a web app that serves images. Since the same request will always return the same image, I want the accessing browsers to cache the images as aggressively as possible. I pretty much want to tell the browser

Here's your image. Go ahead and keep it; it's really not going to change for the next couple of days. No need to come back. Really. I promise.

I do, so far, set

Cache-Control: public, max-age=86400
Last-Modified: (some time ago)
Expires: (two days from now)

and of course return a 304 not modified if the request has the appropriate If-Modified-Since header.

Is there anything else I can do (or anything I should do differently) to get my message across to the browsers?

The app is hosted on the Google App Engine, in case that matters.

+7  A: 

You may be interested in checking out the following Google Code article:

In a nutshell, all modern browsers should be able to cache your images appropriately as instructed, with those HTTP headers.

Daniel Vassallo
That's an interesting read; thanks.
balpha
A: 

Try .htaccess like

<ifmodule mod_gzip.c>
  mod_gzip_on Yes
  mod_gzip_dechunk Yes
  mod_gzip_item_include file \.(html?|txt|css|js|php|pl)$
  mod_gzip_item_include handler ^cgi-script$
  mod_gzip_item_include mime ^text/.*
  mod_gzip_item_include mime ^application/x-javascript.*
  mod_gzip_item_exclude mime ^image/.*
  mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
</ifmodule>

<ifmodule mod_deflate.c>
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz
AddType application/x-httpd-php .php
AddType application/x-httpd-php .php3
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE application/x-httpd-php
AddOutputFilterByType DEFLATE application/x-javascript
</ifmodule>

<ifmodule mod_expires.c>
  ExpiresActive On
  ExpiresDefault "access plus 1 seconds" 
  ExpiresByType text/html "access plus 1 seconds" 
  ExpiresByType image/gif "access plus 2592000 seconds" 
  ExpiresByType image/jpeg "access plus 2592000 seconds" 
  ExpiresByType image/png "access plus 2592000 seconds" 
  ExpiresByType text/css "access plus 604800 seconds" 
  ExpiresByType text/javascript "access plus 216000 seconds" 
  ExpiresByType application/x-javascript "access plus 216000 seconds" 
</ifmodule>

<ifmodule mod_headers.c>
  <filesMatch "\\.(ico|pdf|flv|jpg|jpeg|png|gif|swf)$">
    Header set Cache-Control "max-age=2592000, public" 
  </filesmatch>
  <filesMatch "\\.(css)$">
    Header set Cache-Control "max-age=604800, public" 
  </filesmatch>
  <filesMatch "\\.(js)$">
    Header set Cache-Control "max-age=216000, private" 
  </filesmatch>
  <filesMatch "\\.(xml|txt)$">
    Header set Cache-Control "max-age=216000, public, must-revalidate" 
  </filesmatch>
  <filesMatch "\\.(html|htm|php)$">
    Header set Cache-Control "max-age=1, private, must-revalidate" 
  </filesmatch>
</ifmodule>
Alec Smart
The application is on the Google App Engine; there is no `.htaccess` (or Apache, for that matter)
balpha
Oops I overlooked that. Anyways am keeping it up so that it might help those with Apache.
Alec Smart
+4  A: 

You can do better. 304s are still a HTTP request/response. Though the image is not downloaded again, the latency can be killing.

If you can include a version identifier in your image names, you can set the max-age to 2 years. That way, you prevent 304s. If the image ever changes, you update the version identifier thereby changing the file name. This ensures that the browser will issue a fresh request.

It needs some changes to your project structure. The version identifier can be the SVN revision number when the image was last updated, and can be auto-generated at build time. You'd also need to update the html, so if you have a logical mapping between image name and image path, your job would be easier.

Images are rarely updated, so you could also follow a manual approach if you can't automate what I described above. The trick is to only add new images, never modify them.

sri
Those are some good ideas. The long max-age is something I'm definitely going to do; although I think I read that one year is the allowed maximum according to the RFC. The version identifier is something I have thought about, but unfortunately it's not an option. The app just serves the images to be included in other web pages (think like Gravatar).
balpha
+2  A: 

There is a very important value on cache header that you have not mentioned here:

"post-check=900, pre-check=3600"

Read this article about this topic (and search for more):

http://www.rdlt.com/cache-control-post-check-pre-check.html

Aristos
That's interesting, thanks. A few things are worth mentioning: This is IE only, the post you link to itself links to a quite old article, and I don't find any mention of your "your clients browsers always ask..." claim; quite the opposite, in the comment threads there are many indications that IE does in fact honor the standard headers.
balpha
Maybe is for ie only, I do not know. What I do know is that I see on google chrome, and mozilla firefox, on developer tools, that from the moment I place this value, is stop getting all the time, the images. The truth is that I am working fast, and after I see that works better, didn't check it too deep. Maybe inside the source code of chrome there is a real ansewr if this values are used or not.
Aristos
Aristos
+1  A: 

I don't know that it'll help beyond what solutions others have offered, but you could use the HTML5 offline web apps facilities to more explicitly ask the browser to store a local copy.

Ken
That's a good idea. It won't help in my case (because I serve the images only, I don't control the HTML of the pages that include these images). But good to know for the general case. +1
balpha
+1  A: 

You could add an ETag representation for each image and then compare it to the If-None-Match header on inbound requests (see "Why isn’t my custom delivered image caching in the browser?"). This is redundant when using the preferred Last-Modified header and it's just another way to say 304 anyway. (I think GAE does this automatically for static files, not sure though.)

Gravatar sets very old Last-Modified dates -- the default seems to be "Wed, 11 Jan 1984 08:00:00 GMT". The 5-minute expiration causes browsers to check for updated images frequently. In other words, I think they're inviting 304s, not trying to convince browsers to just use the local cache. Their headers look like this:

Date: Sat, 20 Mar 2010 07:52:43 GMT
Last-Modified: Wed, 11 Jan 1984 08:00:00 GMT
Expires: Sat, 20 Mar 2010 07:57:43 GMT
Cache-Control: max-age=300

The big difference is the expiration time -- you want two days, they want five minutes. So if you want browsers to just use the cached image for 48 hours, do what you're doing, only set Cache-Control: max-age=172800 (86400 is 24 hours).

benm
A: 

A few days cache age is very low. You should set it to one year or even more. Of course this might raise problems when the image actually changes but you can solve that by adding a version number to the image and changing the page that references the image to include the path to the new image.

I wrote more about web application caching here: http://patchlog.com/web/7-methods-to-cache-web-applications/

Mihai Secasiu
Are you kidding me? I'm all for benefit of the doubt and I always appreciate any tips or help that people give me in their answers. But writing pretty much the same thing as Sripathi Krishnan did over a week ago (just with more detail) only to link to your losely relevant blog post -- that I call spam.
balpha
This might be hard to believe but I didn't see his answer before I posted. And also my post has more ideas about caching but of course it's simpler to dismiss something and just vote down instead of reading a rather lengthy post.
Mihai Secasiu
Believe it or not, the downvote isn't mine. Still, your blog post (which I don't find lengthy at all; the text linked in the accepted answer is much longer) has nothing to do with my problem.
balpha
If the fact that I linked to my post which is related to the question is not your problem then what is? the fact that my answer is duplicate? I find it hard to consider an answer spam just because it's a duplicate. I didn't give this answer because I wanted to link to my post. I gave it because I thought it might help. I linked to the post only because I didn't want to write something I already wrote in another place.
Mihai Secasiu