views:

489

answers:

2

I want to find a minimal set of headers, that work with "all" caches and browsers (also when using HTTPS!)

On my web site, I'll have three kinds of resources:

(1) Forever cacheable (public / equal for all users)

Example: 0A470E87CC58EE133616F402B5DDFE1C.cache.html (auto generated by GWT)

  • These files are automatically assigned a new name, when they change content (based on the MD5).

  • They should get cached as much as possible, even when using HTTPS (so I assume, I should set Cache-Control: public, especially for Firefox?)

  • They shouldn't require the client to make a round-trip to the server to validate, if the content has changed.

(2) Changing occasionally (public / equal for all users)

Examples: index.html, mymodule.nocache.js

  • These files change their content without changing the URL, when a new version of the site is deployed.

  • They can be cached, but probably need a round-trip to be revalidated every time.

(3) Individual for each request (private / user specific)

Example: JSON responses

  • These resources should never be cached unencrypted to disk under no circumstances. (Except maybe I'll have a few specific requests that could be cached.)

I have a general idea on which headers I would probably use for each type, but there's always something I could be missing.

A: 

Cases one and two are actually the same scenario. You should set Cache-Control: public and then generate a URL with includes the build number / version of the site so that you have immutable resources that could potentially last forever. You also want to set the Expires header a year or more in the future so that the client will not need to issue a freshness check.

For case 3, you could all of the following for maximum flexibility :

"Cache-Control", "no-cache, must-revalidate"
"Expires", 0
"Pragma", "no-cache"
Different URLs for new builds are probably not an option: a) This would force the client to re-download the forever-cacheable files. They get unique names to avoid that. b) The main URL to my site should be just `https://www.example.com/` c) I want bookmarks to always refer to the newest version of my site (imagine, the bookmarks to a stackoverflow question would contain the build number of the site).
Chris Lercher
Hi Chris,This approach is generally used for CSS and JS resources rather than documents. I agree it's not applicable for document identifiers, in which case you should simply set cache-control public, Last-Modified and etag on the headers which will cause a freshness check each time and only a 304 will be sent back if there are no changes since the last download. Alternatively, you could download the actual dynamic page content in each page via JS so you preserve the URL while still allowing effective caching.
Yes, that's pretty much the way, GWT handles this for me: My index.html (changing occasionally) includes mymodule.nocache.js (changing occasionally), which automatically includes the correct forever-cacheable files (large parts of js, GWT managed image bundles, ...) The only thing it leaves to me, is setting the correct http headers for each type. I want to reduce these headers to a minimum, since they account for a large percentage of the transfer volume. So do I need e.g. both Last-Modified *and* ETag etc.?
Chris Lercher
+6  A: 

I would probably use these settings:

  1. Cache-Control: max-age=3155760000 – Representations may be cached by any cache. The cached representation is to be considered fresh for the next 100 years (with leap years).
  2. Cache-Control: no-cache – Representations are allowed to be cached by any cache. But caches must submit the request to the origin server for validation before releasing a cached copy.
  3. Cache-Control: no-store – Caches must not cache the representation under any condition.

See Mark Nottingham’s Caching Tutorial for further information.

Gumbo
Makes sense, and looks very minimal. Question: Isn't Cache-Control an HTTP 1.1 header, while HTTP 1.0 only understands the Expires header (?) Should I still care about HTTP 1.0 proxies? And: Can I generally skip the "must-revalidate" directive?
Chris Lercher
@chris_l: I understand the values *s-max*, *must-revalidate* and *public* to only be useful when HTTP authentication/authorization takes place. Because if HTTP authentication/authorization takes place, a representation is automatically considered as *private* and these three values can change that.
Gumbo
@Gumbo: One thing I'm pretty sure about is, that I need to set *public*, when I want Firefox 3+ to cache public files to disk while using HTTPS: http://stackoverflow.com/questions/174348/will-web-browsers-cache-content-over-https
Chris Lercher
@chris_l: Sorry, but I don’t know anything about browser quirks.
Gumbo
Some browsers, such as IE, are starting to treat Cache-Control: no-cache as if it was no-store. This is admittedly not according to the RFC, but it is knowingly done to "fix" the mistake done by MANY of using no-cache to prevent sensitive data from being stored unencrypted on disk.
AviD
@Gumbo: Your answer is already very helpful, and serves as a great starting point. Thanks also for the link! I also only know way too little about browser quirks, legacy proxies etc. - maybe somebody will come to our rescue :-) +1
Chris Lercher
@AviD: Which versions of IE are doing that - where can I find more information about this? And: Do you think that simply using "Cache-Control: max-age=0" could work better for my use case (2)?
Chris Lercher
@chris_l, I happened across this link: http://palisade.plynt.com/issues/2008Jul/cache-control-attributes/ .I don't remember how previous versions behaved, though I think IE7 did this too.
AviD